Wednesday, November 16, 2011

Did you find the difference? _N_ study notes 1

Look at the differences between data set a1, a2 and a3:

data a00;
      input a b;
      cards;
      2 2
      3 3
      5 5
      ;
run;

data a01;
      x=1;
run;

data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;

data a2;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;

data a3;
      retain x;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;
 Things to remember:

1)  a1 v.s. a2: in " if then do end", we use set a01, the value of x will be retained in each iteration of data steps; if you put 'set' in data statement, the value will be retained automatically;

2) a2 v.s. a3: if the difference of use retain or not. use retain, the value of x is retained in each data step;

3) a1 v.s. a3: it looks there is no difference. however, the internal is different. if you remove set a00 both and you will find the new a1 has two obs while new a3 only has one obs.

step-by-step study of:
data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;


1) first step of data a1: _n_=1, read in set a01 and retain the value of x=1; goes down and read the first value of set a00; combined together we get  (x a b)=(1 2 2)
2) second step of a1: _n_=2, without reading set a01 but just the second row of a00, since x=1 is retained, so we get (x a b)
               (1 2 2)
               (2 3 3) 
3) in the same way for _n_=3;
4) forth step of data a1, _n_=4, but since a00 has only three records, jump out of data step; at last we got three rows as the result shows
5) if there is no set a00, then how many obs will we get? (the answer is 2)

Did you find the difference?

Look at the differences between data set a1, a2 and a3:

data a00;
      input a b;
      cards;
      2 2
      3 3
      5 5
      ;
run;

data a01;
      x=1;
run;

data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;

data a2;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;

data a3;
      retain x;
      if _n_=1 then do;
            x=1;
      end;
      set a00;
run;
 Things to remember:

1)  a1 v.s. a2: in " if then do end", we use set a01, the value of x will be retained in each iteration of data steps; if you put 'set' in data statement, the value will be retained automatically;

2) a2 v.s. a3: if the difference of use retain or not. use retain, the value of x is retained in each data step;

3) a1 v.s. a3: it looks there is no difference. however, the internal is different. if you remove set a00 both and you will find the new a1 has two obs while new a3 only has one obs.

step-by-step study of:
data a1;
      if _n_=1 then do;
            set a01;
      end;
      set a00;
run;


1) first step of data a1: _n_=1, read in set a01 and retain the value of x=1; goes down and read the first value of set a00; combined together we get  (x a b)=(1 2 2)
2) second step of a1: _n_=2, without reading set a01 but just the second row of a00, since x=1 is retained, so we get (x a b)
               (1 2 2)
               (2 3 3) 
3) in the same way for _n_=3;
4) forth step of data a1, _n_=4, but since a00 has only three records, jump out of data step; at last we got three rows as the result shows
5) if there is no set a00, then how many obs will we get? (the answer is 2)

Sunday, November 13, 2011

行和列一样的两个data set对应元素相减做差

mitbbs上看到的一个问题,两个data set (pre, post),行和列都一样,要做的是对两个data set对应的值做差。

/*
原作者问题:
今天遇到一个很简单的case,用datastep能够解决,不过感觉太麻烦了,我觉得SQL该能很方便的解决这个问题,而平时自己sql又不熟悉,只好上版上来找好心人问问了。问题很简单,我有两个一摸一样的datasetprepost,我要对20多个variablepost-pre,如果用datastep的话,每个dataset里面的variable都要改名加上后缀prepost,然后要再merge到一个大的dataset里面,然后还要计算diff,感觉操作起来非常复杂。我想做得是用第一个post dataset 整个减掉 pre dataset 里面每个variable 对应
的值,请问要怎么操作呢? 我来编个dataset吧!假设

pre

id   measure1 measure2 measure3
1       20               30             40
2       50               60             70

post

id   measure1 measure2 measure3
1       15               20             10
2       20               40             45

我想得到如下的dataset
diff
id   measure1 measure2 measure3
1        5                10             30
2       30               20             25

*/



/*  俺给的解答  */
data pre;
input id   measure1 measure2 measure3;
cards;
1       20               30             40
2       50               60             70
;

data post;
input id   measure1 measure2 measure3 ;
cards;
1       15               20             10
2       20               40             45
;
run;

%macro selectdif(maxindex);
      %do i=1 %to &maxindex;
            pre.measure&i-post.measure&i as dif&i
            %if &i ne &maxindex %then ,
            ;
      %end;
%mend;

proc sql;
      select pre.id, %selectdif(maxindex=3)
      from pre, post
      where pre.id=post.id;
quit;