Friday, December 30, 2011

proc freq v.s. proc sql

sasuser.sale2000 has variable origin and dest, if we want to list the unique combination of these two variables, we can use proc sql; select distinct ... Another method is using proc freq. It not only gives the no duplicated combination, but also sorts the result by the tables variables. The sas code is:


options formdlim='-' ;
proc sql;
    create table sqlrst as
    select distinct origin, dest
      from sasuser.sale2000
      order by origin, dest;
quit;

proc print data=sqlrst width=min;
run;



proc freq data=sasuser.sale2000 noprint;
    tables origin*dest / list out=freq1;
run;

proc print data=freq1(drop=count percent) width=min;
run;

Wednesday, November 16, 2011

Did you find the difference? _N_ study notes 1

Look at the differences between data set a1, a2 and a3:

data a00;
      input a b;
      cards;
      2 2
      3 3
      5 5
      ;
run;

data a01;
      x=1;
run;

data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;

data a2;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;

data a3;
      retain x;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;
 Things to remember:

1)  a1 v.s. a2: in " if then do end", we use set a01, the value of x will be retained in each iteration of data steps; if you put 'set' in data statement, the value will be retained automatically;

2) a2 v.s. a3: if the difference of use retain or not. use retain, the value of x is retained in each data step;

3) a1 v.s. a3: it looks there is no difference. however, the internal is different. if you remove set a00 both and you will find the new a1 has two obs while new a3 only has one obs.

step-by-step study of:
data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;


1) first step of data a1: _n_=1, read in set a01 and retain the value of x=1; goes down and read the first value of set a00; combined together we get  (x a b)=(1 2 2)
2) second step of a1: _n_=2, without reading set a01 but just the second row of a00, since x=1 is retained, so we get (x a b)
               (1 2 2)
               (2 3 3) 
3) in the same way for _n_=3;
4) forth step of data a1, _n_=4, but since a00 has only three records, jump out of data step; at last we got three rows as the result shows
5) if there is no set a00, then how many obs will we get? (the answer is 2)

Did you find the difference?

Look at the differences between data set a1, a2 and a3:

data a00;
      input a b;
      cards;
      2 2
      3 3
      5 5
      ;
run;

data a01;
      x=1;
run;

data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;

data a2;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;

data a3;
      retain x;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;
 Things to remember:

1)  a1 v.s. a2: in " if then do end", we use set a01, the value of x will be retained in each iteration of data steps; if you put 'set' in data statement, the value will be retained automatically;

2) a2 v.s. a3: if the difference of use retain or not. use retain, the value of x is retained in each data step;

3) a1 v.s. a3: it looks there is no difference. however, the internal is different. if you remove set a00 both and you will find the new a1 has two obs while new a3 only has one obs.

step-by-step study of:
data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;


1) first step of data a1: _n_=1, read in set a01 and retain the value of x=1; goes down and read the first value of set a00; combined together we get  (x a b)=(1 2 2)
2) second step of a1: _n_=2, without reading set a01 but just the second row of a00, since x=1 is retained, so we get (x a b)
               (1 2 2)
               (2 3 3) 
3) in the same way for _n_=3;
4) forth step of data a1, _n_=4, but since a00 has only three records, jump out of data step; at last we got three rows as the result shows
5) if there is no set a00, then how many obs will we get? (the answer is 2)