Reading data in to SAS

Back to Political Science

Finding Data

Getting Accounts

SAS Questions

SPSS Questions

reading and writing SAS data sets

1) Tell SAS what to call the data.

Each time you create a dataset you must use the DATA statement to name it. The syntax is:

DATA dataname;
where dataname can be up to 8 characters long and can include letters, numbers (but cannot begin with a number), or underscores.

2) Tell SAS where the data are.

If you are using an external data file (usually an ascii file that contains nothing but the raw data you wish to use SAS to analyze), you will need to use an infile statement:

INFILE 'file location';
The format of the file location will vary depending on which platform you're running SAS. If, for example, you're working on our WIN NT network, the command will look like this:

INFILE 'u:\polisci\yourfoldername\...';
where yourfoldername is your folder on the artsci network, and the ellipses represent the folder you've created that holds your data. On the Convex machine, which is Unix-based, the statement will look different. If the data file is located in the same directory as the program you're creating, all you need do is include the file name in the quotes:

INFILE 'filename';
If its location is elsewhere, you'll have to use the appropriate Unix command to change directory to that where the file is located.

Generally speaking, it is best to have your data in a separate file and identify it using infile statements. Sometimes, however, you might wish to enter the data in the same file as your SAS program. In this case you'll have no INFILE statement, and you'll list your data following a CARDS; statement, which will come immediately after the INPUT statement discussed next.

3) Tell SAS what the data are.

A raw datafile is just that. We call it raw because it simply contains numbers, often lots of numbers. Those numbers, of course, have significance to you, the researcher: some represent GNP, for example, while others may represent the percentage of the vote that the conservative party received in the most recent parliamentary election. The INPUT statement tells SAS which is which. There are two formats for the INPUT statement; which you use depends upon what your data look like.

List (or List Directed) Input is used when the data have been entered without regard to column, but where the values of the variables for each observation are separated by spaces. In the INPUT statement, then, each variable's name is listed, separated by a space. Character variables are signified with a dollar sign ($) following the name, as in the variable vara below:

INPUT vara varb varc ...;

This statment tells SAS that in each line of data, the first variable it comes across will be vara (which is a character variable), then varb, varc, etc..., all of which are numerical variables.

Sometimes, though, your data will simply look like a solid block of numbers with no spaces between the variables, or at least not between every variable. In column input, your input statement must contain the column in which SAS will find the variable you name:

INPUT vara <$> 1-4 <.2> ... ;
In this statement, variable vara is a text variable, can be found in the column numbers 1 through 4 (if the variable is only one column long you don't need an ending column number), and its last two digits should come after the decimal point.

With each data step you perform, SAS creates a temporary work data file; at the end of your job, this file is deleted. Often it's easier to create a permanent SAS data set. and Samples