Thursday, July 5, 2012

R vs SAS 1: R aggregate v.s. SAS proc summary

In SAS, it's convenient to calculate mean/sum alike statistics over different subset of the original data using proc summary.

In R we can get the similar result using function "aggregate", or use "tapply" for simple condition.

Example:



library(stats)
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, sum)-> data1
aggregate(cbind(ncases, ncontrols) ~ alcgp , data = esoph, sum)-> data2
merge(data1, data2, by.x="alcgp", by.y="alcgp")



gives us:



We can get this from SAS:


data a;
infile "./esoph.txt" firstobs=2;
input agegp $ alcgp $ tobgp $ ncases ncontrols;
run;

proc print data=a;
run;


proc summary data=a ;
class alcgp tobgp;
var ncases ncontrols;
output out=temp(drop=_freq_) sum=;
run;

proc print data=temp;
run;

proc sort data=temp;
by alcgp;
run;

data final(drop=_type_);
merge temp(where=(_type_=3)) temp(where=(_type_=2) rename=(ncases=tot_ncases ncontrols=tot_ncontrols));
by alcgp;
run;

proc print data=final;
run;



The output is:



No comments:

Post a Comment