Thursday, March 8, 2012

split a big data set into several subsets

This questions comes from when I want to email a dataset, it's too large to email. So I need to split it into several subsets to email.

Usually for two variables data, 1 million obs will be about 10MB after gzip. This is a proper size to email.

Here shows an example how to split a 10000 records data into several subdata with each having about 2300 obs.


options mprint mlogic;

data test;
  do i=1 to 10000;
    x=rannor(1);
      output;
  end;
run;

%let n_obs=2300;

proc sql noprint;
  select count(1) into :m_total from test;
quit;

%macro sub_data;
    %let n_dataset=%sysfunc(ceil((&m_total/&n_obs)));
      %do i=1 %to &n_dataset;
        data sub_data_&i;
          set test;
        if (&i-1)*&n_obs+1<=_n_<=&i*&n_obs;
        run;
      %end;
%mend sub_data;

%sub_data;

No comments:

Post a Comment