Thursday, August 15, 2013

proc sql: include redundant vars in select statement with group by


This is to show: in proc sql group by statement, select should only contain group by vars and summary vars(vars to be summarized). Otherwise the result may be totally different from wanted.
 
data a;
input id a $2. value;
cards;
1 a 12
2 b 112
3 c 1121
1 a 3
2 b 23
3 c 15
1 a 16
;
run;
 
/* include both id and a in the select statement */
proc sql;
select id, a, sum(value) as v
from a
group by id;
quit;
 
/* include only id in the select statement */
proc sql;
select id, sum(value) as v
from a
group by id;
quit;
 
endsas;
 
/* output for include both id and a: there are duplicates */
      id  a          v
----------------------
       1  a         31
       1  a         31
       1  a         31
       2  b        135
       2  b        135
       3  c       1136
       3  c       1136
/* outpur for include only id */
      id         v
------------------
       1        31
       2       135
       3      1136