Thursday, January 27, 2011

zz: Change the variable value for consecutive variable with some special conditions

This question is answered by other people not me.

The data looks like below. Want to do: for the variable high, if there are consecutive 1s or consecutive -1s, we will compare the corresponding p_sm. For 1, keep the maximum p_sm and set the others as missing . ; for -1, keep the minimum and set the others as missing .;  For example, for the first and second obs with consecutive 1, since 102.88<105.29, we will change the second 1 as .;  In other words, We need to change 2, 14, 16, 20 and so on to be missing.


There are two methods given by elek:

    * method 1;
    data want;
            retain min max;
            do until (last.high);
                    set tmp1.have;
                    by high notsorted;
                    if first.high then do;
                            min=p_sm;
                            max=p_sm;
                   end;
                   else do;
                           min=min(min,p_sm);
                           max=max(max,p_sm);
                   end;
           end;
           do until (last.high);
                   set tmp1.have;
                   by high notsorted;
                   if high=1 and p_sm^=max then high=.;
                   if high=-1 and p_sm^=min then high=.;
                   output;
           end;
   run;

    * method 2;
    data tmp1;
            set tmp1.have;
            obs=_n_;
    run;
    proc means data=tmp1 noprint idmin;
            by high notsorted;
            var p_sm;
            id  p_sm obs;
           output out=tmp2(drop=_freq_ _type_) min(p_sm)=min max(p_sm)=max;
   run;
   proc means data=tmp1 noprint;
           by high notsorted;
           var p_sm;
           id  p_sm obs;
           output out=tmp3(drop=_freq_ _type_) min(p_sm)=min max(p_sm)=max;
   run;
   data want;
           merge tmp1 tmp2 tmp3;
           by obs;
           if high=1 and max^=p_sm then high=.;
           if high=-1 and min^=p_sm then high=.;
           drop min max obs;
   run;

1 comment:

  1. 第一个办法是对相邻的1或者-1,找出对应的p_sm的极大值和极小值。然后通过条件比较来改变high的值。

    第二个办法用proc means,还真不熟悉。

    ReplyDelete