Monday, January 31, 2011

A good book for new python learners

Head First Programming: A Learner's Guide to Programming Using the Python Language

Find the max length of non-repeated substring

The question is: given the string such as 'abcabcbb', 'bbbbb', 'abbcbba', what is the max length of non-repeated sub-string? For example, for 'abcabcbb', the answer is the length of 'abc' or 'bca' or 'cab', they all have length 3; for 'bbbbb', it only has length 1; for 'abbcbba', its length is 2.

Thursday, January 27, 2011

怎么在免安装版本的sas当中读入sas7bdat文件

如果是安装版的sas,不管是9.1还是9.2,都能双击打开sas7bdat文件。但是免安装的sas 9.1.3 直接打开是不行的,通过import什么的也不可以。可以这么做:

比如把 temp.sas7bdat 位置在 "c:\mydata\temp.sas7bdat" ,那么:

在sas中输入  libname mylib 'c:\mydata';

然后再sas右侧,explore,libraries, 看见mydata,然后点击进去,就看见自己的data了。

zz: Change the variable value for consecutive variable with some special conditions

This question is answered by other people not me.

The data looks like below. Want to do: for the variable high, if there are consecutive 1s or consecutive -1s, we will compare the corresponding p_sm. For 1, keep the maximum p_sm and set the others as missing . ; for -1, keep the minimum and set the others as missing .;  For example, for the first and second obs with consecutive 1, since 102.88<105.29, we will change the second 1 as .;  In other words, We need to change 2, 14, 16, 20 and so on to be missing.


There are two methods given by elek:

    * method 1;
    data want;
            retain min max;
            do until (last.high);
                    set tmp1.have;
                    by high notsorted;
                    if first.high then do;
                            min=p_sm;
                            max=p_sm;
                   end;
                   else do;
                           min=min(min,p_sm);
                           max=max(max,p_sm);
                   end;
           end;
           do until (last.high);
                   set tmp1.have;
                   by high notsorted;
                   if high=1 and p_sm^=max then high=.;
                   if high=-1 and p_sm^=min then high=.;
                   output;
           end;
   run;

    * method 2;
    data tmp1;
            set tmp1.have;
            obs=_n_;
    run;
    proc means data=tmp1 noprint idmin;
            by high notsorted;
            var p_sm;
            id  p_sm obs;
           output out=tmp2(drop=_freq_ _type_) min(p_sm)=min max(p_sm)=max;
   run;
   proc means data=tmp1 noprint;
           by high notsorted;
           var p_sm;
           id  p_sm obs;
           output out=tmp3(drop=_freq_ _type_) min(p_sm)=min max(p_sm)=max;
   run;
   data want;
           merge tmp1 tmp2 tmp3;
           by obs;
           if high=1 and max^=p_sm then high=.;
           if high=-1 and min^=p_sm then high=.;
           drop min max obs;
   run;

Wednesday, January 26, 2011

If the difference of two consecutive number is less then 7, then assign it as missing

options formdlim=' ';

/***************************************************************************
If the difference of two consecutive number is less then 7, then assign it as missing;
*  First, the target numbers will be set as 0; If set as missing ,there is problem;
*  Next, set the zeros to be missing.
***************************************************************************/

data temp;
input d1-d6;
cards;
1 2 7 23 100 1000
2 3 33 54 56 1000
3 . 4 6 44 100
;
run;

data temp1;
   set temp;
array ss(1:6) d1-d6;
array s(1:6) d1-d6;
do i=2 to 6;
  if s[i]-s[i-1]<=7 then do;
     ss[i]=0/* Here if set as missing, there is problems */
        ss[i-1]=0;
        end;
end;
run;

data want;
  set temp1;
  array s d1-d6;
  do i=1 to 6;
    if s[i]=0 then s[i]=.;
   end;
run;


proc print data=want;
run;


Saturday, January 22, 2011

Get the different chars appearance frequency

options formdlim='-';

************* Another Problem form MITBBS *****************;
* The question like this: have data x like below consists of 1 and 2 only *;
* Want to count the frequency of 1's and 2's appear consecutively *;
* More detailed, how many times 1 appear once consecutively *;
* More detailed, how many times 1 appear twice consecutively *;
* And so on *;
* Until how many times 2 appear ** times consecutively ? *;
* That is, the result should be: *;
/**************************************************************
                         x   counter      freq
                      ----------------------------
                         1         1         3
                         1         4         1
                         2         2         2
                         2         3         1
                         2         5         1
**************************************************************/


data test;
  input x;
  cards;
  1
  2
  2
  1
  1
  1
  1
  2
  2
  2
  2
  2
  1
  2
  2
  1
  2
  2
  2
  ;
run;


data new;
   set test;
   by x notsorted;
   retain counter;
   if first.x then counter=0;
      counter+1;
   if last.x then output;
run;

proc print data=new;
run;

proc sql;
   select x, counter, n(counter) as freq
   from new
   group by x, counter
   order by x, counter
   ;
quit;


***  If don't use proc sql, we can use these data steps: **;
data new;
   set test;
   by x notsorted;
   retain counter;
   if first.x then counter=0;
      counter+1;
   if last.x then output;
run;

proc print data=new;
run;

proc sort data=new;
  by x  counter;
run;

data want;
  set new;
  by x  counter;
  retain freq;
  if first.x  or first.counter then  freq=1;
    else freq=freq+1;
  if last.x or last.counter then output;
run;

proc print data=want;
run;