Thursday, March 8, 2012

how to sample several groups of data from a given data set, sampled data will not appear again?

e.g., data test has 10000 obs, we want to sample 3 data sets, called sample_1, sample_2 and sample_3. The obs in sample_1 will not appear in sample_2, the obs in sample 1, sample_2 will not appear in sample_3.

In the following we use an indicator called group. For sample_1, we srs from data test, and id its group=1. For sample_2, and so on, we restrict group<1 to exclude data in sample_1. And id sample_2 as group=2, and so on.


options mprint mlogic;

data test;
  do i=1 to 10000;
    x=rannor(1);
      output;
  end;
run;

%let n_sample=3;

%macro m_sample;
  %do i=1 %to &n_sample;
    proc surveyselect data=test %if &i>1 %then %do; (where=(group<1)) %end; method=srs sampsize=100 out=sample_&i;
    run;

      proc sort data=sample_&i;
        by i;
      run;
      proc sort data=test;
        by i;
      run;

      data test;
        merge test sample_&i(in=in2);
        by i;
        if in2 then group=&i;
      run;
  %end;
%mend m_sample;

%m_sample;

No comments:

Post a Comment