Wednesday, July 31, 2013

Fwd: Creating a Multidimensional Array for iris data


names(myiris)=c("sl", "sw", "pl", "pw", "name")

with(myiris, cut(pl, breaks=quantile(pl, probs=seq(0,1, by=1/3)), include.lowest=TRUE))

plbin=with(myiris, cut(pl, breaks=quantile(pl, probs=seq(0,1, by=1/3)), include.lowest=TRUE))

pwbin=with(myiris, cut(pw, breaks=quantile(pw, probs=seq(0,1, by=1/3)), include.lowest=TRUE))

cbind(myiris, plbin, pwbin)->new

with(myiris, ftable(table(pwbin, plbin, name), row.vars=1:3))

## or    ftable(xtabs(~name+pwbin+pwbin,new))

Thursday, July 25, 2013

The difference between BY and CLASS in PROC MEANS

original post link:


CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:


Comparison of the BY and CLASS Statements

Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables. 
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values. 
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.


Practically, this means that:

· The input dataset must be sorted by the BY variables. It doesn't have to be sorted by the CLASS variables.

· Without the NWAY option in the PROC MEANS statement, the CLASS statement will calculate summaries for each class variable separately as well as for each possible combination of class variables. The BY statement only provides summaries for the groups created by the combination of all BY variables.

· The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table.

· The MEANS procedure is more efficient at treating BY groups than CLASS groups.

options obs=10000;

libname temp "/data02/temp/temp_hsong/to_delete";

proc contents data=temp.high_vis_kws;

proc sort data=temp.high_vis_kws out=high_vis_kws;
 by nrank day_of_week;

proc summary data=high_vis_kws;
        by nrank day_of_week;
        var clicks visits;
        output out=classby1 sum=;

proc summary data=high_vis_kws;
        class nrank day_of_week;
        var clicks visits;
        output out=classby2 sum=;

title "using by";
proc print data=classby1 width=min;

title "using class";
proc print data=classby2 width=min;