Wednesday, December 22, 2010

The difference of 'where' and 'if' in SAS

A graph may explain everything


*** The general rule when comparing IF statement and WHERE statement is that when reading data from datasets, SAS first select observations based on the WHERE condition and then take other actions, while the IF statement would have to wait their turn. Using the WHERE statement may improve the efficiency of your SAS programs because SAS is not required to read all observations from the input data set.;

data test1;
     set sashelp.class;
     if sex="M";
run;

/*
log:
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.TEST1 has 10 observations and 5 variables.
*/

data test2;
    set sashelp.class;
    where sex="M";
run;

/*log:
NOTE: There were 10 observations read from the data set SASHELP.CLASS.
      WHERE sex='M';
NOTE: The data set WORK.TEST2 has 10 observations and 5 variables.
*/

*** When using two or more WHERE statements in the same DATA step, then which comes last rules.
     When using two or more IF statements, both(all) of them take effect;

*** Both of the two IF statements are applied to the data, TEST3 contains only female students whose age is less than 14;
data test3;
     set sashelp.class;
     if sex="F";
     if age<14;
run;

*** Only the latter WHERE statement takes effect and dataset test4 contains all students with age lt 14, irrespective of their gender;
data test4;
    set sashelp.class;
    where sex="F";
    where age<14;
run;

*** Check out the following example;

data class;
     set sashelp.class;
run;

*** Using FIRSTOBS= and OBS= data set option;
*** SAS selects 9 records (from the second through the tenth);
data class1;
     set sashelp.class(firstobs=2 obs=10);
run;

*** Using WHERE statement;
*** SAS first select all female records, then try to set the 2-10 observations.;
data class2;
     set sashelp.class(firstobs=2 obs=10);
     where sex="F";
run;

*** Using IF statement;
*** SAS first select the 2-10 records, then select Female students;
data class3;
     set sashelp.class(firstobs=2 obs=10);
     if sex="F";
run;


*** Another example about IF condition and WHERE condition;
data temp;
     do x=1 to 10;
        output;
     end;
run;

*** Where condition;
data temp1;
    set temp;
    where '0';
run;

*** IF condition;
data temp2;
    set temp;
    if '0';
run;

*** The WHERE statement evaluated the '0' as true and selected all observations, while the IF statement evaluated '0' as false and select no observations.

Use if to create several data sets
    data a b;
    input sex$ country$;
    if sex="f" then output a;
    if sex="m" then output b;
    cards;
    f china
    m china
    f america
    m australia
;

From the following, we can find the difference of executing speed difference. Using WHERE, SAS only reads data satisfied the WHERE condition. When using IF, SAS will read all data first and then select the data satisfied the IF condition. That is, WHERE is more efficient than IF if either of them can be used.

1 data a;
2 set sashelp.class;
3 if sex='F';
4 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.A has 9 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.35 seconds
cpu time 0.00 seconds

5
6 data b;
7 set sashelp.class;
8 where sex='F';
9 run;

NOTE: There were 9 observations read from the data set SASHELP.CLASS.
WHERE sex='F';
NOTE: The data set WORK.B has 9 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds

No comments:

Post a Comment