Back to Political Science


Finding Data


Getting Accounts


SAS Questions


SPSS Questions

SAS tips

When working with large data files, like the NES, GSS, Eurobarometer and the U.S. Supreme Court Database, there are many ways to write your SAS program that will increase the program's efficiency and therefore cut down on the computer resources and time it takes to run it.


Use only those variables you need.

Use Keep or Drop statements in your Data statement to reduce the size of the data set.

ex.

libname sasdata 'u:\polisci\yourfolder';
data new;
set sasdata.nes (keep=v10-v30 v53 v67);

-OR-

set sasdata.nes (drop=v1-v9 v14 v28 v164);



Use fewer data steps

Through the BY and WHERE commands, one can accomplish in one step what it might otherwise take several to do.

ex.

data new; set sasdata.nes; where year=87; run;
-OR-
proc freq data=one; where year=87; tables sex partyid; run;
-OR-
proc freq data=one; by agegroup; var sex partyid; run;

In the first example, you will create a data set called new that contains only those observations from your data set that contain the value '87' in the variable 'year.' The second and third examples demonstrate that you can also use WHERE and BY clauses within procedures. Example 2 will perform frequency counts for the variables 'sex' and 'partyid' only on the subset of the data that meets the condition of year=87. In the third, however, the PROC FREQ will actually perform several frequency counts on each of those variables: one for each value of the variable agegroup.




Test your program first

When testing programs for errors, reduce the number of observations you use with the OBS statement. You can do this either in your data step, or in the PROC statement itself.

ex.

libname sasdata 'u:\polisci\yourfolder';
data new;
set sasdata.nes (obs=100);
-OR-
proc freq data=sasdata.nes(obs=100);
-OR-
proc reg data=sasdata.nes(obs=100);




"Else if" statements

Each time you use an "if ... " statement, SAS must search through every observation of your data to see if it meets the condition you've set, then perform the operation you specify. In social sciences researchers usually use IF statements to create new variables or recode old variables. Assume, for instance, that you have a variable, ideology, that is measured on a five point scale and you want to collapse it into a three point scale (liberals, moderates and conservatives). The inefficient way to do it uses separate IF statements for each condition you need in order to recode.

if ideology=1 or ideology=2 then ideology=1;
if ideology=3 then ideology=2;
if ideology=4 or ideology=5 then ideology=3;

In this case, for each IF ... THEN statement, SAS looks through every observation in your data, checking to see if it meets the condition set out in the IF portion of the statement. If it does, it performs the operation described in the THEN statement; if not, it moves on to the next observation.

By using "ELSE IF" instead of "IF", however, you can cut down on the time it takes SAS to perform the operation.

if ideology=1 or ideology=2 then ideology=1;
else if ideology=3 then ideology=2;
else if ideology=4 or ideology=5 then ideology=3;

While SAS looks through every observation in the first IF statement, in the next statements it only looks at those observations that did not meet the condition set out in the first one. As you can imagine, this can make a big difference, time-wise.

Finally, use system files.