easy sas: October 2011

The background is from An Introduction of Generalized Linear Model by A.J.Dobson; It's described as below:

The data are from an experiment to promote the recovery of stroke patients. There were three experimental groups:

A was a new occupational therapy intervention;
B was the existing stroke rehabilitation program conducted in the same hospital where A was conducted;
C was the usual care regime for stroke patients provided in a different hospital.

There were eight patients in each experimental group. The response variable was a measure off unctional ability, the Bartel index; higher scores correspond to better outcomes and the maximum score is 100. Each patient was assessed weekly over the eight weeks of the study.

data table11_1;

input Subject Group $ week1-week8;

datalines;

1 A 45 45 45 45 80 80 80 90

2 A 20 25 25 25 30 35 30 50

3 A 50 50 55 70 70 75 90 90

4 A 25 25 35 40 60 60 70 80

5 A 100 100 100 100 100 100 100 100

6 A 20 20 30 50 50 60 85 95

7 A 30 35 35 40 50 60 75 85

8 A 30 35 45 50 55 65 65 70

9 B 40 55 60 70 80 85 90 90

10 B 65 65 70 70 80 80 80 80

11 B 30 30 40 45 65 85 85 85

12 B 25 35 35 35 40 45 45 45

13 B 45 45 80 80 80 80 80 80

14 B 15 15 10 10 10 20 20 20

15 B 35 35 35 45 45 45 50 50

16 B 40 40 40 55 55 55 60 65

17 C 20 20 30 30 30 30 30 30

18 C 35 35 35 40 40 40 40 40

19 C 35 35 35 40 40 40 45 45

20 C 45 65 65 65 80 85 95 100

21 C 45 65 70 90 90 95 95 100

22 C 25 30 30 35 40 40 40 40

23 C 25 25 30 30 30 30 35 40

24 C 15 35 35 35 40 50 65 65

;

run;

Here a linear model is used to study the relation of Group and Time. The model here used is:
E(Y_{i,j,k}) = \alpha_{i} + \beta_{i} * t_{k} + e_{i,j,k}
Here Y_{i,j,k} is the score at time t_{k} ( k=1 to 8) for patient j (j=1 to 8) in group i (i=1 for Group A, 2 for B and 3 for C);

For linear regression convenience, we need to rearrange the data from wide to long:

data table11_1long;

set table11_1;

array week(8) week1-week8;

do time=1 to 8;

score=week(time);

output;

end;

drop week1-week8;

run;

Also we need to transfer Categorical Variable to Dummy Variable:

data table11_3a;

set table11_1long;

if group='B' then do; a2=1; a3=0; end;

if group='C' then do; a2=0; a3=1; end;

if group='A' then do; a2=0; a3=0; end;

run;

Without considering the dependence of the score for different time, we first assume they are independent. Consider the interaction of time and Group, the model can be expressed as:

proc glm data=table11_3a;

model score = time a2 a3 time*a2 time*a3 / solution ss3;

run;

quit;

The above model considers the interaction of Group and Time. It's a saturated model. It is the same as we make a regression of Score over Time for each Group. So, it can be equally achieved by:

proc reg data=table11_1long;

by group;

model score = time;

run;

Sometimes it's required an exploratory analysis, or called data reduction or data summary. It consists of summarizing the response profiles for each subject by a small number of descriptive statistics based on the assumption that measurements on the same subject are independent. Here intercept and slope are used as the summary statistics:

proc reg data = table11_1long;

by subject;

model score = time;

ods output parameterestimates = table11_4 (keep = subject variable estimate stderr);

run;

quit;

proc tabulate data=table11_4;

var estimate stderr;

class variable subject / order=data;

table subject='Subject'*sum='', (variable='')*(estimate stderr)*f=6.3 / rts=20 row=float;

run;

Then we can get the following summary table:

------------------------------------------------

| | Intercept | time |

| |-------------+-------------|

| |Estim-| ard |Estim-| ard |

| | ate |Error | ate |Error |

|------------------+------+------+------+------|

|------------------| | | | |

|1 |30.000| 7.289| 7.500| 1.443|

|------------------+------+------+------+------|

|2 |15.536| 4.099| 3.214| 0.812|

|------------------+------+------+------+------|

|3 |39.821| 3.209| 6.429| 0.636|

|------------------+------+------+------+------|

|4 |11.607| 3.387| 8.393| 0.671|

|------------------+------+------+------+------|

|5 |100.00| 0.000| 0.000| 0.000|

|------------------+------+------+------+------|

|6 | 0.893| 5.304|11.190| 1.050|

|------------------+------+------+------+------|

|7 |15.357| 4.669| 7.976| 0.925|

|------------------+------+------+------+------|

|8 |25.357| 1.971| 5.893| 0.390|

|------------------+------+------+------+------|

|9 |38.571| 3.522| 7.262| 0.698|

|------------------+------+------+------+------|

|10 |61.964| 2.236| 2.619| 0.443|

|------------------+------+------+------+------|

|11 |14.464| 5.893| 9.702| 1.167|

|------------------+------+------+------+------|

|12 |26.071| 2.147| 2.679| 0.425|

|------------------+------+------+------+------|

|13 |48.750| 8.927| 5.000| 1.768|

|------------------+------+------+------+------|

|14 |10.179| 3.209| 1.071| 0.636|

|------------------+------+------+------+------|

|15 |31.250| 1.948| 2.500| 0.386|

|------------------+------+------+------+------|

|16 |34.107| 2.809| 3.810| 0.556|

|------------------+------+------+------+------|

|17 |21.071| 2.551| 1.429| 0.505|

|------------------+------+------+------+------|

|18 |34.107| 1.164| 0.893| 0.231|

|------------------+------+------+------+------|

|19 |32.143| 1.164| 1.607| 0.231|

|------------------+------+------+------+------|

|20 |42.321| 3.698| 7.262| 0.732|

|------------------+------+------+------+------|

|21 |48.571| 6.140| 7.262| 1.216|

|------------------+------+------+------+------|

|22 |24.821| 1.885| 2.262| 0.373|

|------------------+------+------+------+------|

|23 |22.321| 1.709| 1.845| 0.338|

|------------------+------+------+------+------|

|24 |13.036| 4.492| 6.548| 0.890|

------------------------------------------------

The last part is to do analysis of variance for intercepts and slopes. Since in Table11_4 there is only these four variables: 'subject' 'variable' 'estimate' 'stderr'. In table11_1long there is variable 'time' and 'group'. We need to combine these two data sets:




proc sql;
      create table table11_5 as 
      select table11_4.*, table11_1long.group
      from  table11_4, table11_1long
      where table11_4.subject = table11_1long.subject;
quit;


Then the analysis of Intercepts can be done by:


proc genmod data=table11_5;
      class group (ref='A' param=ref);
      where variable='Intercept';
      model estimate = group ;
run;
quit;


The analysis of Slope can be done by: 


proc genmod data=table11_5;
      class group (ref='A' param=ref);
      where variable='time';
      model estimate = group   ;
run;
quit;


Here we use proc genmod which allows us use categorical variables directly and has the choice of selecting reference level. We don't use proc glm since it has no choice of reference level in the regression.


In the book the author use proc reg to do it. If use proc reg, we need to use dummy variables rather than categorical variable. So, first it requires to combine the table table11_4 with table11_3a together:

proc sql;
      create table table11_5 as 
      select distinct table11_4.*, x.a2 as a2, x.a3 as a3
      from table11_4, table11_3a as x
      where table11_4.subject = x.subject;
quit;


proc reg data=table11_5;
      where variable='Intercept';
      model estimate = a2 a3;
run;
quit;


proc reg data=table11_5;
      where variable='time';
      model estimate = a2 a3;
run;
quit;

It gives the same result as we use proc genmod.

As we know, usually the independence assumption here for the repeated measures is not suitable. A more suitable model for this data should consider the dependence of scores for these 8 time. Generalized Estimation Equations (GEE) model should be more reasonable.

easy sas

Pages

Friday, October 28, 2011

Convert Categorical Variables to Dummy Variables

Wednesday, October 26, 2011

Analysis of variance of Intercept estimates and Slope estimates (proc glm, proc reg, proc genmod)