Excel to SAS
Note: This example does NOT contain actual subject data.
It's not unusual in small pilot studies that investigators use Microsoft Excel to manage their data. I see this in about 30% of the projects I work on. It's also not unusual that principal investigators (PIs) or their staff choose interesting ways in which to format their data in Excel.
As an example, I often receive datasets that are formatted in what I call a "log format". Here, each subject's data is placed within one tab in excel and the data are formatted in the style of a report.

This format is fine for printing for review during lab meetings, but it is not "SAS-friendly". In the example above there are at least four areas of concern:
1. Some variables such as Subject Number, Treatment and Gender are not listed in a format that is readily accessible by SAS.
2. Statistics are calculated via Excel. It is generally not advisable to use Excel as a statistical tool*.
3. There are inconsistencies in formatting. Here, there is a space between the end of the data and the mean, in the group above, no space exists. We must be careful to identify and rectify problems as part of our data cleaning process.
4. Subject data are in a separate tabs, and key variables may or may NOT be located in the same rows or columns on subsequent tabs.
So, next, let's look at how I've reformatted this data for reading by SAS.

1. All data are combined on a single tab.
2. Subject's data are organized such that there are a minimum number of columns; repeated measures data are stored in separate rows; string (text) data have been recoded as numeric data.
3. Using Excel's comment feature, we have annotated columns where necessary to document important formatting attributes.
Once we've completed this, we can write our SAS code to read the Excel data file.

And here is the SAS output that provides a listing of the data and by-treatment means.


In summary, we see that preparing Excel data for importation into SAS is a straightforward procedure. It does requires attention to detail and inconsistencies in how the data are set up in Excel must be addressed. Finally, once imported into SAS, statistics are easily (and more accurately) generated using standard SAS routines.
* See these articles that discuss the use of Excel as a statistical tool:
1. On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. B.D. McCullough and B. Wilson, (2002), Computational Statistics & Data Analysis, 40, pp 713 - 721
2. On the accuracy of statistical procedures in Microsoft Excel ‘97.
B.D. McCullough and B. Wilson, (1999), Computational Statistics & Data Analysis, 31, pp 27-37
3. Problems with using Microsoft Excel for statistics.
J.D. Cryer, (2001), presented at the Joint Statistical Meetings, American Statistical Association, 2001, Atlanta Georgia
4. Using Excel for statistical data analysis. Eva Goldwater, (1999), Univ. of Massachusetts Office of Information Technology
5. Statistical analysis using Microsoft Excel.
Jeffrey Simonoff, (2002)
6. Statistical flaws in Excel. Hans Pottel
- Login to post comments