tag:blogger.com,1999:blog-32075129129491019432024-03-28T02:18:54.218-07:00easy sasKeeping Looking, Don't Settle!sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.comBlogger143125tag:blogger.com,1999:blog-3207512912949101943.post-37808795265426829402014-12-10T09:20:00.000-08:002014-12-10T09:20:49.278-08:00SAS: piece of code to generate codes by SAS PUT statementThe question is from a colleague: there are 50 states and it requires to assign code to each state from a data set. In the original code there are 50 of if conditions like "if ... then ...; else if ... then ......".<br />
<div>
<br /></div>
<div>
What will happen if there are 500 states? Write 500 "if ... else ..." statements?<br />
<div>
<br /></div>
<div>
It is not a good to do in this hard coding way. If these "if" statements are necessary are necessary in the code, then the best way is to let SAS generate these codes with<b><span style="color: blue;"> PUT</span></b> statement.</div>
<div>
<br /></div>
<div>
An example is given. the following data set i is from 1 to 10, and the corresponding x and y value are given. suppose I wanna get the x and y by the value of i. The method of using SAS to generate the "if ... else ..." statement is:</div>
<div>
<br /></div>
<div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">data</span></b><span style="background: white; font-family: 'Courier New';"> a;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">do</span><span style="background: white; font-family: 'Courier New';"> i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">1</span></b><span style="background: white; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">to</span><span style="background: white; font-family: 'Courier New';"> </span><b><span style="background: white; color: teal; font-family: "Courier New";">10</span></b><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">x</span><span style="background: white; font-family: 'Courier New';"> = i + </span><b><span style="background: white; color: teal; font-family: "Courier New";">5</span></b><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';"> y = i * </span><b><span style="background: white; color: teal; font-family: "Courier New";">3</span></b><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">output</span><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">end</span><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">data</span></b><span style="background: white; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">_null_</span><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">set</span><span style="background: white; font-family: 'Courier New';"> a </span><span style="background: white; color: blue; font-family: "Courier New";">end</span><span style="background: white; font-family: 'Courier New';"> = last;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">if</span><span style="background: white; font-family: 'Courier New';"> _n_ = </span><b><span style="background: white; color: teal; font-family: "Courier New";">1</span></b><span style="background: white; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">then</span><span style="background: white; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">put</span><span style="background: white; font-family: 'Courier New';"> / </span><span style="background: white; color: purple; font-family: "Courier New";">"if i = "</span><span style="background: white; font-family: 'Courier New';"> i </span><span style="background: white; color: purple; font-family: "Courier New";">"then x =
"</span><span style="background: white; font-family: 'Courier New';"> x </span><span style="background: white; color: purple; font-family: "Courier New";">"and y = "</span><span style="background: white; font-family: 'Courier New';"> y </span><span style="background: white; color: purple; font-family: "Courier New";">";"</span><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> put /</span><span style="background: white; color: purple; font-family: "Courier New";">"else if i
= "</span><span style="background: white; font-family: 'Courier New';"> i </span><span style="background: white; color: purple; font-family: "Courier New";">" then x = "</span><span style="background: white; font-family: 'Courier New';"> x </span><span style="background: white; color: purple; font-family: "Courier New";">"and y = "</span><span style="background: white; font-family: 'Courier New';"> y </span><span style="background: white; color: purple; font-family: "Courier New";">";"</span><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">if</span><span style="background: white; font-family: 'Courier New';"> last </span><span style="background: white; color: blue; font-family: "Courier New";">then</span><span style="background: white; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">put</span><span style="background: white; font-family: 'Courier New';"> /</span><span style="background: white; color: purple; font-family: "Courier New";">"else x =
"</span><span style="background: white; font-family: 'Courier New';"> x </span><span style="background: white; color: purple; font-family: "Courier New";">"and y = "</span><span style="background: white; font-family: 'Courier New';"> y </span><span style="background: white; color: purple; font-family: "Courier New";">";"</span><span style="background: white; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background: white; font-family: 'Courier New';">;</span><o:p></o:p></div>
</div>
<div>
<br /></div>
<div>
The generated code is like:</div>
<div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">if</span><span style="background: white; font-family: 'Courier New';"> i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">1</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">6</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">3</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">2</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">7</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">6</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">3</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">8</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">9</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">4</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">9</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">12</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">5</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">10</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">15</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">6</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">11</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">18</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">7</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">12</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">21</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">8</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">13</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">24</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">9</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">14</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">27</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> if i = </span><b><span style="background: white; color: teal; font-family: "Courier New";">10</span></b><span style="background: white; font-family: 'Courier New';"> then x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">15</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">30</span></b><span style="background: white; font-family: 'Courier New';"> ;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: red; font-family: "Courier New";">else</span><span style="background: white; font-family: 'Courier New';"> x = </span><b><span style="background: white; color: teal; font-family: "Courier New";">15</span></b><span style="background: white; font-family: 'Courier New';"> and y = </span><b><span style="background: white; color: teal; font-family: "Courier New";">30</span></b><span style="background: white; font-family: 'Courier New';"> ;</span><o:p></o:p></div>
</div>
<div>
<br /></div>
<div>
So you don't need to hard code these in your script.</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
By the way, the best way to do this is not by the "if ... else ..." statements. The best way should use <span style="color: blue;"><b>FORMAT</b></span> in sas.</div>
<div>
<br /></div>
<div>
Another way is to use PROC SQL to join to get the corresponding value. </div>
<div>
<br /></div>
</div>
sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com430tag:blogger.com,1999:blog-3207512912949101943.post-51680308802091784222014-06-18T05:35:00.002-07:002014-06-18T05:36:37.128-07:00Becoming a Data Scientist – Curriculum via MetromapThe picture is from <a href="http://nirvacana.com/thoughts/becoming-a-data-scientist/">Becoming a Data Scientist – Curriculum via Metromap</a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOC2LOU3g76yB5pMPZGf30MtSjYVawWWibSg2B1cKVHsJaddx5G1TBhqwP0XNWbtcuXWrwGjvC3U-XYb24fIohM3pcUuX1C4c3nMFdist4Eb0rqjEC3I2xWOg_cL1XQobDzRx4nDMpwsc/s1600/RoadToDataScientist1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOC2LOU3g76yB5pMPZGf30MtSjYVawWWibSg2B1cKVHsJaddx5G1TBhqwP0XNWbtcuXWrwGjvC3U-XYb24fIohM3pcUuX1C4c3nMFdist4Eb0rqjEC3I2xWOg_cL1XQobDzRx4nDMpwsc/s1600/RoadToDataScientist1.png" height="519" width="640" /></a></div>
<br />shmhttp://www.blogger.com/profile/17397006152965693418noreply@blogger.com1tag:blogger.com,1999:blog-3207512912949101943.post-4271698458005896252013-12-30T22:36:00.000-08:002013-12-30T22:36:16.538-08:00R: how to debug the errorsJust some tips:<br />
<br />
<span style="color: red;">options(error=recover)</span>: it will tell R to launch debug section and you can choose which one to debug<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjKgGOPhVgeY137xtwsxFQZiYy8WnDEJ-GO9iVD0NxLW2RDcfxLjoNL60t2WqDnPUk_-BqBIZz6cIAhHmP-6To-4nK9LQMD3aXfouMkYLiIFRsn7NgLTOje5BpQGT5CuJRhMqvlpY0SX0/s1600/r_debug_01.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjKgGOPhVgeY137xtwsxFQZiYy8WnDEJ-GO9iVD0NxLW2RDcfxLjoNL60t2WqDnPUk_-BqBIZz6cIAhHmP-6To-4nK9LQMD3aXfouMkYLiIFRsn7NgLTOje5BpQGT5CuJRhMqvlpY0SX0/s1600/r_debug_01.JPG" /></a></div>
<span style="color: red;"><br /></span>
<span style="color: red;">options(show.error.locations=TRUE)</span>: let R show the source line number <br />
<br />
Something else:<br />
<br />
use <span style="color: red;">traceback()</span> to locate where the last error message is and then use <span style="color: red;">browser() </span>to run the function again to check what is wrong.sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-54217595931555943562013-12-24T23:19:00.000-08:002013-12-24T23:24:09.322-08:00A note about what is the output in predict.lm(lm_model, test_data, type=”terms”):<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSJcy3xz1YZ_eZj4K9fPJJHf-yUmIc7Mdp2SGPdE9P9ceUTV1mx_5zSDT8CPSCfhXxmQlmANX9SBKsrjG6kGJC3VFIUmrYNduA-r3caYbwpxNAc9PsKV_Tl7JQ6uk13O5blmiJ_ds71A0/s1600/predict.lm.type.terms.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSJcy3xz1YZ_eZj4K9fPJJHf-yUmIc7Mdp2SGPdE9P9ceUTV1mx_5zSDT8CPSCfhXxmQlmANX9SBKsrjG6kGJC3VFIUmrYNduA-r3caYbwpxNAc9PsKV_Tl7JQ6uk13O5blmiJ_ds71A0/s1600/predict.lm.type.terms.JPG" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<pre class="r geshifilter-R" style="overflow: auto; padding: 10px; word-wrap: normal;"><pre class="r geshifilter-R" style="color: #222222; line-height: 15px; overflow: auto; padding: 10px; word-wrap: normal;"><span style="font-size: large;"><span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;">## first explain what is type="terms": If type="terms" is selected, a matrix of predictions </span>
<span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;">## on the additive scale is produced, each column giving the deviations from the overall mean </span>
<span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;">## (of the original data's response, on the additive scale), which is given by the attribute "constant".</span>
<a href="http://inside-r.org/r-doc/base/set.seed" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">set.seed</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">9999</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
x1=<a href="http://inside-r.org/r-doc/stats/rnorm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">rnorm</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">10</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
x2=<a href="http://inside-r.org/r-doc/stats/rnorm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">rnorm</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">10</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
y=<a href="http://inside-r.org/r-doc/stats/rnorm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">rnorm</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">10</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
<a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a>=<a href="http://inside-r.org/r-doc/stats/lm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">lm</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span>y<span style="margin: 0px; padding: 0px;">~</span>x1<span style="margin: 0px; padding: 0px;">+</span>x2<span style="color: #009900; margin: 0px; padding: 0px;">)</span>
<a href="http://inside-r.org/r-doc/stats/predict" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">predict</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a><span style="color: #339933; margin: 0px; padding: 0px;">,</span> <a href="http://inside-r.org/r-doc/utils/data" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">data</span></a>=<a href="http://inside-r.org/r-doc/base/cbind" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">cbind</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span>x1<span style="color: #339933; margin: 0px; padding: 0px;">,</span>x2<span style="color: #339933; margin: 0px; padding: 0px;">,</span>y<span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span> type=<span style="color: blue; margin: 0px; padding: 0px;">"terms"</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
<a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a><span style="margin: 0px; padding: 0px;">$</span>coefficient<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span><span style="margin: 0px; padding: 0px;">+</span><a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a><span style="margin: 0px; padding: 0px;">$</span>coefficient<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">2</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span><span style="margin: 0px; padding: 0px;">*</span>x1<span style="margin: 0px; padding: 0px;">+</span><a href="http://inside-r.org/r-doc/base/mean" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">mean</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a><span style="margin: 0px; padding: 0px;">$</span>coefficient<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">3</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span><span style="margin: 0px; padding: 0px;">*</span>x2<span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="margin: 0px; padding: 0px;">-</span><a href="http://inside-r.org/r-doc/base/mean" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">mean</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span>y<span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="margin: 0px; padding: 0px;">-</span>predlm<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span>
<a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a><span style="margin: 0px; padding: 0px;">$</span>coefficient<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span><span style="margin: 0px; padding: 0px;">+</span><a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a><span style="margin: 0px; padding: 0px;">$</span>coefficient<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">3</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span><span style="margin: 0px; padding: 0px;">*</span>x2<span style="margin: 0px; padding: 0px;">+</span><a href="http://inside-r.org/r-doc/base/mean" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">mean</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/packages/cran/lmm" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">lmm</a><span style="margin: 0px; padding: 0px;">$</span>coefficient<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">2</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span><span style="margin: 0px; padding: 0px;">*</span>x1<span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="margin: 0px; padding: 0px;">-</span><a href="http://inside-r.org/r-doc/base/mean" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">mean</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span>y<span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="margin: 0px; padding: 0px;">-</span>predlm<span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">2</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span></span></pre>
</pre>
shmhttp://www.blogger.com/profile/17397006152965693418noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-21886182207893092292013-12-23T22:48:00.004-08:002013-12-24T23:28:09.382-08:00R: Calculate ROC and Plot ROC<pre class="r geshifilter-R" style="line-height: 15px; overflow: auto; padding: 10px; word-wrap: normal;"><span style="font-size: large;"><a href="http://inside-r.org/r-doc/base/library" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">library</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/packages/cran/ROCR" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">ROCR</a><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;">
</span><a href="http://inside-r.org/r-doc/base/library" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">library</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/packages/cran/Hmisc" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;">Hmisc</a><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;">
</span><span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;">## calculate AUC from the package ROCR and compare with it from Hmisc</span><span style="color: #222222;">
</span><span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;"># method 1: from ROCR</span><span style="color: #222222;">
</span><a href="http://inside-r.org/r-doc/utils/data" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">data</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #222222;">ROCR.simple</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;">
pred=prediction</span><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #222222;">ROCR.simple</span><span style="color: #222222; margin: 0px; padding: 0px;">$</span><span style="color: #222222;">prediction</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #222222;"> ROCR.simple</span><span style="color: #222222; margin: 0px; padding: 0px;">$</span><span style="color: #222222;">labels</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;">
perf=performance</span><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #222222;">pred</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #222222;"> </span><span style="color: blue; margin: 0px; padding: 0px;">'tpr'</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #222222;"> </span><span style="color: blue; margin: 0px; padding: 0px;">'fpr'</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;"> </span><span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;">#true positive and false negative</span><span style="color: #222222;">
</span><a href="http://inside-r.org/r-doc/graphics/plot" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">plot</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #222222;">perf</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #222222;"> colorize=</span><span style="color: #cc66cc;">T</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;">
perf2=performance</span><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #222222;">pred</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #222222;"> </span><span style="color: blue; margin: 0px; padding: 0px;">'auc'</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;">
auc=</span><a href="http://inside-r.org/r-doc/base/unlist" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">unlist</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/r-doc/methods/slot" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">slot</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #222222;">perf2</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #222222;"> </span><span style="color: blue; margin: 0px; padding: 0px;">'y.values'</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;"> </span><span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;"># this is the AUC</span><span style="color: #222222;">
</span><span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;"># method 2: from Hmisc</span><span style="color: #222222;">
rcorrstat=rcorr.cens</span><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #222222;">ROCR.simple</span><span style="color: #222222; margin: 0px; padding: 0px;">$</span><span style="color: #222222;">prediction</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #222222;"> ROCR.simple</span><span style="color: #222222; margin: 0px; padding: 0px;">$</span><span style="color: #222222;">labels</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #222222;">
rcorrstat</span><span style="color: #009900; margin: 0px; padding: 0px;">[</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1</span><span style="color: #009900; margin: 0px; padding: 0px;">]</span><span style="color: #222222;"> </span><span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;"># 1st is AUC, 2nd is Accuracy Ratio(Gini Coefficient, or PowerStat, or Somer's D)</span></span></pre>
shmhttp://www.blogger.com/profile/17397006152965693418noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-86191330822003536622013-09-09T23:18:00.001-07:002013-09-09T23:22:13.912-07:00hadoop on cloudera quickstart vm test example 01 wordcount<div dir="ltr">
It's not easy to install hadoop and related items like hdfs hive and so on of your own, and it is more difficult to config them after installation.<br />
<br />
Thanks to cloudera, we can test hadoop with its integrated tool kit (Cloudera QuickStart VM). it provides vmware, kvm and virtualbox edition to download. Everything is configured and you can test without any difficulty.<br />
<br />
in this video, I show a example hot to do wordcount in hadoop. The youtube link is:<br />
<div class="separator" style="clear: both; text-align: center;">
<object width="320" height="266" class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://i1.ytimg.com/vi/ih72o2_7dTw/0.jpg"><param name="movie" value="http://www.youtube.com/v/ih72o2_7dTw?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="http://www.youtube.com/v/ih72o2_7dTw?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object></div>
<br />
<br />
Steps not included in the video:<br />
1: download vmware player or virtualbox and install;<br />
2: download Cloudera QuickStart VM from cloudera(the link may change all the time, so you can google keyword "Cloudera QuickStart VM" to download).<br /> <br />
<br />
The example of the wordcount test is:<br />
<br />
1: install wget on centos server<br />
<span style="background-color: #999999;">sudo yum -y install wget</span><br />
2: create test dir in /home/cloudera<br />
<span style="background-color: #999999;">mkdir /home/cloudera/test</span><br />
<span style="background-color: #999999;">cd /home/cloudera/test</span><br />
3: create test txt file<br />
<span style="background-color: #999999;">echo "what can I do with hadoop on hadoop server or hive server" > test1.txt</span><br />
4: put the txt file to hdfs<br />
<span style="background-color: #999999;">hdfs dfs -mkdir /user/cloudera/input </span><br />
<span style="background-color: #999999;">hdfs dfs -put /home/cloudera/test/test1.txt /user/cloudera/input/ </span><br />
5: go to /usr/lib/hadoop-mapreduce/ <br />
<span style="background-color: #999999;">cd /usr/lib/hadoop-mapreduce/</span><br />
6: run the job<br />
<span style="background-color: #999999;">hadoop jar hadoop-mapreduce-examples.jar wordcount /user/cloudera/input/test1.txt /user/cloudera/output</span><br />
7: check what are there in the output<br />
<span style="background-color: #999999;">hdfs dfs -ls /user/cloudera/output/</span><br />
8: reat the output file<br />
<span style="background-color: #999999;">hdfs dfs -cat /user/cloudera/output/part-r-00000</span></div>
sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com28tag:blogger.com,1999:blog-3207512912949101943.post-14030638950518705632013-08-15T15:33:00.001-07:002013-08-15T15:33:09.997-07:00proc sql: include redundant vars in select statement with group by<div dir="ltr"><br><div class="gmail_quote"><div> <font face="Consolas" size="4"><span style="font-size:16pt"> <div>This is to show: in proc sql group by statement, <span style="background-color:yellow">select should only contain group by vars and </span><span style="background-color:yellow">summary </span><span style="background-color:yellow">vars</span><span style="background-color:yellow">(vars to be summarized)</span>. Otherwise the result may be totally different from wanted.</div> <div> </div> <div>data a;</div> <div>input id a $2. value;</div> <div>cards;</div> <div>1 a 12</div> <div>2 b 112</div> <div>3 c 1121</div> <div>1 a 3</div> <div>2 b 23</div> <div>3 c 15</div> <div>1 a 16</div> <div>;</div> <div>run;</div> <div><font face="Garamond"> </font></div> <div>/* include both id and a in the select statement */</div> <div>proc sql;</div> <div>select <span style="background-color:yellow">id, a</span>, sum(value) as v</div> <div>from a</div> <div>group by id;</div> <div>quit;</div> <div><font face="Garamond"> </font></div> <div>/* include only id in the select statement */</div> <div>proc sql;</div> <div>select <span style="background-color:yellow">id</span>, sum(value) as v</div> <div>from a</div> <div>group by id;</div> <div>quit;</div> <div> </div> <div>endsas;</div> <div> </div> <div>/* output for include both id and a: there are <font color="red">duplicates</font> */</div> <div> id a v</div> <div>----------------------</div> <div> 1 a 31</div> <div><font color="red"> 1 a 31</font></div> <div><font color="red"> 1 a 31</font></div> <div> 2 b 135</div> <div><font color="red"> 2 b 135</font></div> <div> 3 c 1136</div> <div><font color="red"> 3 c 1136</font></div> <div>/* outpur for include only id */ </div> <div> id v</div> <div>------------------</div> <div> 1 31</div> <div> 2 135</div> <div> 3 1136</div> </span></font></div></div></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-36079583532210699722013-07-31T21:48:00.001-07:002013-07-31T21:48:36.033-07:00Fwd: Creating a Multidimensional Array for iris data<div dir="ltr"><br><div class="gmail_quote"><div dir="ltr"><div>myiris=iris</div><div><br></div><div>names(myiris)=c("sl", "sw", "pl", "pw", "name")<br></div><div><br></div> <div><div>with(myiris, cut(pl, breaks=quantile(pl, probs=seq(0,1, by=1/3)), include.lowest=TRUE))</div> </div><div><br></div><div><div>plbin=with(myiris, cut(pl, breaks=quantile(pl, probs=seq(0,1, by=1/3)), include.lowest=TRUE))</div></div><div><br></div><div><div>pwbin=with(myiris, cut(pw, breaks=quantile(pw, probs=seq(0,1, by=1/3)), include.lowest=TRUE))</div> </div><div><br></div><div><div>cbind(myiris, plbin, pwbin)->new</div></div><div><br></div><div><div style>with(myiris, ftable(table(pwbin, plbin, name), row.vars=1:3))</div></div><div style><br></div><div>## or ftable(xtabs(~name+pwbin+pwbin,new))</div> <div><br></div><div><br></div><div><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhspW5MajBt-6bF4u4-92rVRoQW0kqy3IaL7oraTzWMkO2z-lcDdKjvnuRQdF2J-9jpuZ9E_q0378r397o5Ba2DtOeRcf7E6xbyHYMJ115w1do1ooOwqFy_g9LeDK8shGtLA5Pe8tsg_Yw/s1600/image-716033.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhspW5MajBt-6bF4u4-92rVRoQW0kqy3IaL7oraTzWMkO2z-lcDdKjvnuRQdF2J-9jpuZ9E_q0378r397o5Ba2DtOeRcf7E6xbyHYMJ115w1do1ooOwqFy_g9LeDK8shGtLA5Pe8tsg_Yw/s320/image-716033.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5907008181468007842" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJexqtxwStcS1JgetozuBU4FlyU4eHy8Ctli60Cr8vqZZhiPTVpIJ3PW7ChyphenhyphenE3_AbJ9nfMF2BwgFg1K460BIY2PNpN-rfMgeTj2dXD7V65dDFbI4zZYBSRV-Nw5iu88NQfd-VajO83OZM/s1600/image-718233.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJexqtxwStcS1JgetozuBU4FlyU4eHy8Ctli60Cr8vqZZhiPTVpIJ3PW7ChyphenhyphenE3_AbJ9nfMF2BwgFg1K460BIY2PNpN-rfMgeTj2dXD7V65dDFbI4zZYBSRV-Nw5iu88NQfd-VajO83OZM/s320/image-718233.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5907008187536072210" /></a><br></div><div><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1fJ4zqYJeGn0MyonmzoS2m01sOMIAzImyCqhW5uqFsIs02VTQKfxO7tGbQGoJlHn6H9I8ve2fIRRKLeMTOxDVQwuoUfs5Izb3NUOZw8BYW9S_XthyphenhyphenB6GcAOQq5sIbfz3g8yLFTWxOgCA/s1600/image-720534.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1fJ4zqYJeGn0MyonmzoS2m01sOMIAzImyCqhW5uqFsIs02VTQKfxO7tGbQGoJlHn6H9I8ve2fIRRKLeMTOxDVQwuoUfs5Izb3NUOZw8BYW9S_XthyphenhyphenB6GcAOQq5sIbfz3g8yLFTWxOgCA/s320/image-720534.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5907008197311331506" /></a><br> </div><div><br></div></div> </div><br></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-22999876311923601342013-07-25T18:55:00.001-07:002013-07-25T18:55:21.384-07:00The difference between BY and CLASS in PROC MEANS<div dir="ltr">original post link: <a href="https://communities.sas.com/thread/44285?start=0&tstart=0">https://communities.sas.com/thread/44285?start=0&tstart=0</a><br><br> <br><br><p style="margin:0in 0in 0.0001pt;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"> <span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)">CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:</span></p> <p style="margin:0in 0in 0.0001pt;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline;outline:0px none;min-height:8pt;word-spacing:0px"> <span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)"> </span></p> <p style="margin:0in 0in 0.0001pt;background:none repeat scroll 0% 0% white;vertical-align:baseline;outline:0px none;word-spacing:0px"> <strong><span style="font-size:14.5pt;font-family:inherit;color:blue;border:1pt none windowtext;padding:0in">Comparison of the BY and CLASS Statements</span></strong><span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)"></span></p> <p style="margin:0in 0in 0.0001pt;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline;outline:0px none;word-spacing:0px"> <span style="font-size:10pt;font-family:inherit;color:blue;border:1pt none windowtext;padding:0in">Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.<span> </span></span><span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)"><br> </span><span style="font-size:10pt;font-family:inherit;color:blue;border:1pt none windowtext;padding:0in">When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.<span> </span></span><span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)"><br> </span><span style="font-size:10pt;font-family:inherit;color:blue;border:1pt none windowtext;padding:0in">You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.</span><span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)"></span></p> <p style="margin:0in 0in 0.0001pt;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline;outline:0px none;min-height:8pt;word-spacing:0px"> <span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)"> </span></p> <p style="margin:0in 0in 0.0001pt;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline;outline:0px none;word-spacing:0px"> <span style="font-size:10pt;font-family:"Helvetica Neue";color:rgb(87,87,87)">Practically, this means that:</span></p> <p class="" style="margin-right:0in;margin-bottom:3pt;margin-left:0in;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"> <span style="font-size:10pt;font-family:Symbol;color:rgb(87,87,87)"><span>·<span style="font:7pt "Times New Roman""> </span></span></span><span style="font-size:10pt;font-family:inherit;color:rgb(87,87,87)">The input dataset must be sorted by the BY variables. It doesn't have to be sorted by the CLASS variables.</span></p> <p class="" style="margin-right:0in;margin-bottom:3pt;margin-left:0in;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"> <span style="font-size:10pt;font-family:Symbol;color:rgb(87,87,87)"><span>·<span style="font:7pt "Times New Roman""> </span></span></span><span style="font-size:10pt;font-family:inherit;color:rgb(87,87,87)">Without the NWAY option in the PROC MEANS statement, the CLASS statement will calculate summaries for each class variable separately as well as for each possible combination of class variables. The BY statement only provides summaries for the groups created by the combination of all BY variables.</span></p> <p class="" style="margin-right:0in;margin-bottom:3pt;margin-left:0in;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"> <span style="font-size:10pt;font-family:Symbol;color:rgb(87,87,87)"><span>·<span style="font:7pt "Times New Roman""> </span></span></span><span style="font-size:10pt;font-family:inherit;color:rgb(87,87,87)">The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table.</span></p> <p class="" style="margin-right:0in;margin-bottom:3pt;margin-left:0in;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"> <span style="font-size:10pt;font-family:Symbol;color:rgb(87,87,87)"><span>·<span style="font:7pt "Times New Roman""> </span></span></span><span style="font-size:10pt;font-family:inherit;color:rgb(87,87,87)">The MEANS procedure is more efficient at treating BY groups than CLASS groups.</span></p> <p class="" style="margin-right:0in;margin-bottom:3pt;margin-left:0in;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"><br></p><p class="" style="margin-right:0in;margin-bottom:3pt;margin-left:0in;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"> options obs=10000;<br><br>libname temp "/data02/temp/temp_hsong/to_delete";<br><br>proc contents data=temp.high_vis_kws;<br>run;<br><br>proc sort data=temp.high_vis_kws out=high_vis_kws;<br> by nrank day_of_week;<br> run;<br><br>proc summary data=high_vis_kws;<br> by nrank day_of_week;<br> var clicks visits;<br> output out=classby1 sum=;<br>run;<br><br>proc summary data=high_vis_kws;<br> class nrank day_of_week;<br> var clicks visits;<br> output out=classby2 sum=;<br>run;<br><br>title "using by";<br>proc print data=classby1 width=min;<br>run;<br><br>title "using class";<br>proc print data=classby2 width=min;<br> run;<br><br><span style="font-size:10pt;font-family:inherit;color:rgb(87,87,87)"></span></p><p class="" style="margin-right:0in;margin-bottom:3pt;margin-left:0in;line-height:14.25pt;background:none repeat scroll 0% 0% white;vertical-align:baseline"> <span style="font-size:10pt;font-family:inherit;color:rgb(87,87,87)"><br></span></p></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-2891791597356347832013-04-30T18:31:00.002-07:002013-04-30T18:31:39.139-07:00Distribution of The Difference of Two Uniform Distribution Variable<br />
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">
Suppose <span style="color: blue;">U1~uniform(0,1), U2~uniform(0,1)</span>, and U1 is independent of U2, then what is the distribution of <span style="color: blue;">U1-U2</span>?</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">
The solution is in the picture attached. Then based on this, what is the distribution of <span style="color: red;">U1+U2</span>? The density should be of the same shape while it moves 1 unit to the right. The reason is: if U2 is uniform, then U3=1-U2 is uniform. so U1+U2 is the same as U1+U2=U1-U3+1. So it's density is the same as U1-U2 with i unit right transfer.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibfWvSvzdpnUkAZ_HXGwl-aCH8UC_TfDYdzOoRgtzEpCS4YXsThxtk1hoRYLs4QcCYLaLRXlHnmAG_hwKQ6u4EmRp4eq3jCM686s39YkgFJZKOOqJmiH4UfN2TWCkWl_LVPGGS5R2L8VU/s1600/Distribution+of+The+Difference+of+Two+Uniform+Distribution+Variable.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibfWvSvzdpnUkAZ_HXGwl-aCH8UC_TfDYdzOoRgtzEpCS4YXsThxtk1hoRYLs4QcCYLaLRXlHnmAG_hwKQ6u4EmRp4eq3jCM686s39YkgFJZKOOqJmiH4UfN2TWCkWl_LVPGGS5R2L8VU/s640/Distribution+of+The+Difference+of+Two+Uniform+Distribution+Variable.JPG" width="480" /></a></div>
<div>
<br /></div>
<div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">
The answer can be verified in R. Density plot in R:</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">
<pre style="font-family: Consolas, 'Liberation Mono', Courier, monospace; font-size: 12px; line-height: 16px; padding: 0px; white-space: pre-wrap; width: 744px; word-wrap: break-word;"><pre style="line-height: 15px; overflow: auto; padding: 10px; white-space: pre-wrap; word-wrap: normal;">u1=<a href="http://inside-r.org/r-doc/stats/runif" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">runif</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1000000</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">0</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
u2=<a href="http://inside-r.org/r-doc/stats/runif" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">runif</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1000000</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">0</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span><span style="color: #cc66cc; margin: 0px; padding: 0px;">1</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
<span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;"># Z is the difference of the two uniform distributed variable</span>
z=u1<span style="margin: 0px; padding: 0px;">-</span>u2
<a href="http://inside-r.org/r-doc/graphics/plot" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">plot</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/r-doc/stats/density" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">density</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span>z<span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span> main=<span style="color: blue; margin: 0px; padding: 0px;">"Density Plot of the Difference of Two Uniform Variable"</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span> <a href="http://inside-r.org/r-doc/base/col" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">col</span></a>=<span style="color: #cc66cc; margin: 0px; padding: 0px;">3</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span>
<span style="color: #666666; font-style: italic; margin: 0px; padding: 0px;"># X is the summation of the two uniform distributed variable</span>
x=u1<span style="margin: 0px; padding: 0px;">+</span>u2
<a href="http://inside-r.org/r-doc/graphics/plot" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">plot</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span><a href="http://inside-r.org/r-doc/stats/density" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">density</span></a><span style="color: #009900; margin: 0px; padding: 0px;">(</span>x<span style="color: #009900; margin: 0px; padding: 0px;">)</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span> main=<span style="color: blue; margin: 0px; padding: 0px;">"Density Plot of the Summation of Two Uniform Variable"</span><span style="color: #339933; margin: 0px; padding: 0px;">,</span> <a href="http://inside-r.org/r-doc/base/col" style="color: #0077df; margin: 0px; padding: 0px; text-decoration: none;" target="_blank"><span style="color: #003399; font-weight: bold; margin: 0px; padding: 0px;">col</span></a>=<span style="color: #cc66cc; margin: 0px; padding: 0px;">3</span><span style="color: #009900; margin: 0px; padding: 0px;">)</span></pre>
<pre style="line-height: 15px; overflow: auto; padding: 10px; white-space: pre-wrap; word-wrap: normal;"><span style="color: #009900; margin: 0px; padding: 0px;"><img alt="Inline image 1" src="https://mail.google.com/mail/u/0/?ui=2&ik=88fa3dec15&view=att&th=13e5dae08286aa10&attid=0.1&disp=emb&realattid=ii_13e5dad219220b32&zw&atsh=1" /></span></pre>
<pre style="line-height: 15px; overflow: auto; padding: 10px; white-space: pre-wrap; word-wrap: normal;"><span style="color: #009900; margin: 0px; padding: 0px;"><img alt="Inline image 1" src="https://mail.google.com/mail/u/0/?ui=2&ik=88fa3dec15&view=att&th=13e5dae08286aa10&attid=0.2&disp=emb&realattid=ii_13e5967ca757d0bd&zw&atsh=1" /></span></pre>
<pre style="line-height: 15px; overflow: auto; padding: 10px; white-space: pre-wrap; word-wrap: normal;"><span style="color: #009900; margin: 0px; padding: 0px;">
</span></pre>
<pre style="line-height: 15px; overflow: auto; padding: 10px; white-space: pre-wrap; word-wrap: normal;"><span style="color: #009900; margin: 0px; padding: 0px;">
</span></pre>
</pre>
</div>
</div>
<div>
<br /></div>
sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com1tag:blogger.com,1999:blog-3207512912949101943.post-78801505930382087422013-04-11T19:19:00.001-07:002013-04-11T19:19:31.451-07:00Read SAS data into R (SAS is required)<div dir="ltr"><span style="font-family:Garamond,serif;font-size:16pt">Those 3 ways can read SAS data into R. I prefer the first method since it gives the most choice to control.</span><br><div class="gmail_quote"><div lang="EN-US" link="blue" vlink="purple"> <p class="MsoNormal"><br></p> <pre><i><span style="font-size:12.0pt;color:#666666">### To read SAS data into R(SAS is necessary)</span></i></pre> <pre><i><span style="font-size:12.0pt;color:red">##1</span></i><i><span style="font-size:12.0pt;color:#666666">: first use SAS to export data to csv or dlm, then use R</span></i><span style="font-size:12.0pt;color:#222222"></span></pre> <pre><span style="font-size:12.0pt;color:#222222">libname test </span><span style="font-size:12.0pt;color:blue">"/data/temp/hsong/test"</span><span style="font-size:12.0pt;color:#339933">;</span></pre> <pre><span style="font-size:12.0pt;color:#222222">proc export <a href="http://inside-r.org/r-doc/utils/data" target="_blank"><b><span style="color:#003399">data</span></b></a>=test.hsb12 outfile=</span><span style="font-size:12.0pt;color:blue">"/data/temp/hsong/test/sasxport"</span><span style="font-size:12.0pt;color:#222222"> dbms=<a href="http://inside-r.org/packages/cran/dlm" target="_blank"><span style="color:#0077df">dlm</span></a> replace</span><span style="font-size:12.0pt;color:#339933">;</span><span style="font-size:12.0pt;color:#222222"></span></pre> <pre><span style="font-size:12.0pt;color:#222222"> delimiter=</span><span style="font-size:12.0pt;color:blue">","</span><span style="font-size:12.0pt;color:#339933">;</span><span style="font-size:12.0pt;color:#222222"></span></pre> <pre><span style="font-size:12.0pt;color:#222222">run</span><span style="font-size:12.0pt;color:#339933">;</span><span style="font-size:12pt;font-family:arial"> </span></pre> <pre><i><span style="font-size:12.0pt;color:#666666"># Then read in the exported data with R</span></i><span style="font-size:12.0pt;color:#222222"></span></pre> <pre><span style="font-size:12.0pt;color:#222222"><a href="http://inside-r.org/r-doc/utils/read.table" target="_blank"><b><span style="color:#003399">read.table</span></b></a></span><span style="font-size:12.0pt;color:#009900">(</span><span style="font-size:12.0pt;color:blue">"/data/temp/hsong/test/sasxport"</span><span style="font-size:12.0pt;color:#339933">,</span><span style="font-size:12.0pt;color:#222222"> header=T</span><span style="font-size:12.0pt;color:#009900">)</span></pre> <pre><span style="font-family:arial;font-size:12pt"> </span></pre> <pre><i><span style="font-size:12.0pt;color:red">##2</span></i><i><span style="font-size:12.0pt;color:#666666">: to read in with the package {Hmisc}</span></i></pre> <pre><span style="font-size:12.0pt;color:#222222"><a href="http://inside-r.org/r-doc/base/library" target="_blank"><b><span style="color:#003399">library</span></b></a></span><span style="font-size:12.0pt;color:#009900">(</span><span style="font-size:12.0pt;color:#222222"><a href="http://inside-r.org/packages/cran/Hmisc" target="_blank"><span style="color:#0077df">Hmisc</span></a></span><span style="font-size:12.0pt;color:#009900">)</span><span style="font-size:12pt;font-family:arial"> </span></pre> <pre><span style="font-size:12.0pt;color:#222222">hsb12=sas.get</span><span style="font-size:12.0pt;color:#009900">(</span><span style="font-size:12.0pt;color:#222222">lib=</span><span style="font-size:12.0pt;color:blue">"/data/temp/hsong/test"</span><span style="font-size:12.0pt;color:#339933">,</span><span style="font-size:12.0pt;color:#222222"> mem=</span><span style="font-size:12.0pt;color:blue">"hsb12"</span><span style="font-size:12.0pt;color:#339933">,</span><span style="font-size:12.0pt;color:#222222"> <a href="http://as.is" target="_blank">as.is</a>=T</span><span style="font-size:12.0pt;color:#009900">)</span><span style="font-size:12.0pt;color:#222222"></span></pre> <pre><span style="font-size:12pt;font-family:arial"> </span><br></pre> <pre><i><span style="font-size:12.0pt;color:red">##3</span></i><i><span style="font-size:12.0pt;color:#666666">: read with library {foreign}, but I did not run it successfully</span></i><span style="font-size:12.0pt;color:#222222"></span></pre> <pre><span style="font-size:12.0pt;color:#222222"><a href="http://inside-r.org/r-doc/foreign/read.xport" target="_blank"><b><span style="color:#003399">read.xport</span></b></a></span><span style="font-size:12.0pt;color:#009900">(</span><span style="font-size:12.0pt;color:blue">"path"</span><span style="font-size:12.0pt;color:#009900">)</span></pre> </div></div></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-72052063496535898172013-04-11T19:11:00.001-07:002013-04-11T19:17:50.538-07:00use scan to read in a piece of data for ad-hoc analysis<div dir="ltr">
<span style="font-size: 12pt;"><span style="font-family: courier new, monospace;">Sometimes it is necessary to read in a piece of data for ad-hoc analysis. We can save the data to txt and then read in by read.table. In this way we need to find physical path of the data. It is not convenient for ad-hoc analysis especially when data is in server and R is in local machine.</span></span><br />
<div class="gmail_quote">
<div lang="EN-US" link="blue" vlink="purple">
<div>
<pre><span style="color: #222222; font-size: 12.0pt;"> </span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;">A convenient way is to use scan function in R to directly read in the data by pasting them. Like:</span></pre>
<pre style="line-height: 11.25pt;"><span style="color: #222222; font-size: 9.0pt;"> </span></pre>
<pre style="line-height: 11.25pt;"><span style="color: #222222; font-size: 9.0pt;">x=<a href="http://inside-r.org/r-doc/base/scan" target="_blank"><b><span style="color: #003399;">scan</span></b></a></span><span style="color: #009900; font-size: 9.0pt;">(</span><span style="color: #222222; font-size: 9.0pt;">what=</span><span style="color: #009900; font-size: 9.0pt;">(</span><span style="color: #222222; font-size: 9.0pt;"><a href="http://inside-r.org/r-doc/base/list" target="_blank"><b><span style="color: #003399;">list</span></b></a></span><span style="color: #009900; font-size: 9.0pt;">(</span><span style="color: #222222; font-size: 9.0pt;">a1=</span><span style="color: #cc66cc; font-size: 9.0pt;">0</span><span style="color: #339933; font-size: 9.0pt;">,</span><span style="color: #222222; font-size: 9.0pt;">a2=</span><span style="color: #cc66cc; font-size: 9.0pt;">0</span><span style="color: #009900; font-size: 9.0pt;">)))</span><span style="color: #222222; font-size: 9.0pt;"></span></pre>
<pre style="line-height: 11.25pt;"><span style="color: #222222; font-size: 9.0pt;"> </span></pre>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixIIIweLHXjirVY4OpF6x3XZ_YopovYpS22pozbHkedhZ5V2p4fd0N-K4TvjSCwNNjAZ01xNqcad6y50eNtaiXKYM6TL8xPWvwrq4uUhPZe0HECCnoY9hgnB1eCHvYYkb9tnRKatHxo9A/s1600/image001-707982.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5865777318781975538" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixIIIweLHXjirVY4OpF6x3XZ_YopovYpS22pozbHkedhZ5V2p4fd0N-K4TvjSCwNNjAZ01xNqcad6y50eNtaiXKYM6TL8xPWvwrq4uUhPZe0HECCnoY9hgnB1eCHvYYkb9tnRKatHxo9A/s320/image001-707982.png" /></a></div>
<div class="MsoNormal">
<br /></div>
<pre style="line-height: 11.25pt;"><span style="color: #222222; font-size: 9.0pt;">y=<a href="http://inside-r.org/r-doc/base/data.frame" target="_blank"><b><span style="color: #003399;">data.frame</span></b></a></span><span style="color: #009900; font-size: 9.0pt;">(</span><span style="color: #222222; font-size: 9.0pt;">x</span><span style="color: #009900; font-size: 9.0pt;">)</span><span style="color: #222222; font-size: 9.0pt;"></span></pre>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt_gHE-N06inTe5pTBEoHNcLQahPF5xeDj7-20XcEqauw8FO1N7xKXOpxFvw5tluXSzP_jE3FeG_73mPzxdckbdICfke_Jq-1nI3RlJN4S7DDTzpn0CxOsMSY831ToEWZ_vqufCCPGto4/s1600/image002-709728.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5865777323303348722" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt_gHE-N06inTe5pTBEoHNcLQahPF5xeDj7-20XcEqauw8FO1N7xKXOpxFvw5tluXSzP_jE3FeG_73mPzxdckbdICfke_Jq-1nI3RlJN4S7DDTzpn0CxOsMSY831ToEWZ_vqufCCPGto4/s320/image002-709728.png" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 12.0pt;">Then do any analysis like plot and so on based on df y.</span></div>
</div>
<div style="font-family: 'arial',sans-serif; font-size: 7.5pt; line-height: 10pt;">
<br /></div>
</div>
</div>
</div>
sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-71906367625656934962013-04-11T19:09:00.001-07:002013-04-11T19:18:04.733-07:00Tips to incread the data reading speed in R<div dir="ltr">
<span style="font-family: Garamond,serif; font-size: 16pt;">Here are some tips to increase the data reading speed in R:</span><br />
<div class="gmail_quote">
<div lang="EN-US" link="blue" vlink="purple">
<div class="MsoNormal">
<br /></div>
<pre><i><span style="color: #666666; font-size: 12.0pt;">#The purpose is to compare the processing time when using different options to read in data with read.table</span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><i><span style="color: #666666; font-size: 12.0pt;"># test to read /data02/temp/temp_hsong/test/hsb12.sas7bdat</span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"> </span></pre>
<pre><i><span style="color: #666666; font-size: 12.0pt;">#1: without any options </span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/base/system.time" target="_blank"><b><span style="color: #003399;">system.time</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/utils/read.table" target="_blank"><b><span style="color: #003399;">read.table</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: blue; font-size: 12.0pt;">"/data02/temp/temp_hsong/test/sasxport.txt"</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> header=T</span><span style="color: #009900; font-size: 12.0pt;">))</span><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"> </span></pre>
<pre><i><span style="color: #666666; font-size: 12.0pt;">#2: By default R will convert character vars into factors wile reading</span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><i><span style="color: #666666; font-size: 12.0pt;">#suppress R convert character variables into factors by stringsAsFactors=F</span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/base/system.time" target="_blank"><b><span style="color: #003399;">system.time</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/utils/read.table" target="_blank"><b><span style="color: #003399;">read.table</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: blue; font-size: 12.0pt;">"/data02/temp/temp_hsong/test/sasxport.txt"</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> header=T</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> stringsAsFactors=F</span><span style="color: #009900; font-size: 12.0pt;">))</span><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"> </span></pre>
<pre><i><span style="color: #666666; font-size: 12.0pt;">#3: if data has no comment sign, then tell R</span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/base/system.time" target="_blank"><b><span style="color: #003399;">system.time</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/utils/read.table" target="_blank"><b><span style="color: #003399;">read.table</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: blue; font-size: 12.0pt;">"/data02/temp/temp_hsong/test/sasxport.txt"</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> header=T</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> comment.char=</span><span style="color: blue; font-size: 12.0pt;">''</span><span style="color: #009900; font-size: 12.0pt;">))</span><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"> </span></pre>
<pre><i><span style="color: #666666; font-size: 12.0pt;">#4: roughly tell R a number slightly greater than the number of records</span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/base/system.time" target="_blank"><b><span style="color: #003399;">system.time</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/utils/read.table" target="_blank"><b><span style="color: #003399;">read.table</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: blue; font-size: 12.0pt;">"/data02/temp/temp_hsong/test/sasxport.txt"</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> header=T</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> nrows=</span><span style="color: #cc66cc; font-size: 12.0pt;">8000</span><span style="color: #009900; font-size: 12.0pt;">))</span><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"> </span></pre>
<pre><i><span style="color: #666666; font-size: 12.0pt;">#5: When reading data, it's better tell R the mode of each var by colClasses</span></i><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<pre><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/base/system.time" target="_blank"><b><span style="color: #003399;">system.time</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/utils/read.table" target="_blank"><b><span style="color: #003399;">read.table</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: blue; font-size: 12.0pt;">"/data02/temp/temp_hsong/test/sasxport.txt"</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> header=T</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> sep=</span><span style="color: blue; font-size: 12.0pt;">','</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #222222; font-size: 12.0pt;"> colClasses=<a href="http://inside-r.org/r-doc/base/c" target="_blank"><b><span style="color: #003399;">c</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: #222222; font-size: 12.0pt;"><a href="http://inside-r.org/r-doc/base/rep" target="_blank"><b><span style="color: #003399;">rep</span></b></a></span><span style="color: #009900; font-size: 12.0pt;">(</span><span style="color: blue; font-size: 12.0pt;">"numeric"</span><span style="color: #339933; font-size: 12.0pt;">,</span><span style="color: #cc66cc; font-size: 12.0pt;">11</span><span style="color: #009900; font-size: 12.0pt;">))))</span><span style="color: #222222; font-size: 12.0pt;"></span></pre>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Garamond","serif"; font-size: 16.0pt;">The processing time is:</span></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhw-mVU1CUMKrAyQO9AUjut6OOqkAoE4lbd7f1CNfgFshFS-NpkZVZye9WcCETJlxW0NqVi-eI5LDxczjD8ycN8Akv5Q6xKg5uAPh9lB0MACzpMtLq_G6hSZzSR4B-fSQtpg7AXoHaJnM/s1600/image001-799503.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5865776851060400402" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhw-mVU1CUMKrAyQO9AUjut6OOqkAoE4lbd7f1CNfgFshFS-NpkZVZye9WcCETJlxW0NqVi-eI5LDxczjD8ycN8Akv5Q6xKg5uAPh9lB0MACzpMtLq_G6hSZzSR4B-fSQtpg7AXoHaJnM/s320/image001-799503.png" /></a><span style="font-family: "Garamond","serif"; font-size: 16.0pt;"></span></div>
<div class="MsoNormal">
<br /></div>
</div>
</div>
</div>
sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-74006352213118499072013-04-01T19:06:00.001-07:002013-04-01T19:06:25.939-07:00The possible reason getting irregular coefficient(like we think it is positive, but it shows negetive)<div dir="ltr"><span style="font-family:Garamond,serif;font-size:16pt">Sometimes, we will get irregular coefficient: consider </span><span style="font-family:Garamond,serif;font-size:16pt;color:red">sell_amount vs sell_price</span><span style="font-family:Garamond,serif;font-size:16pt">, we should expect a negative coefficient because as the price increases, the selling amount should become less from common sense. But sometimes it may give a positive coefficient.</span><br> <div class="gmail_quote"><div lang="EN-US" link="blue" vlink="purple"><div> <p class="MsoNormal"><span style="font-size:16.0pt;font-family:"Garamond","serif""> </span></p> <p class="MsoNormal"><span style="font-size:16.0pt;font-family:"Garamond","serif"">The possible reason is we <span style="color:red">put high correlated variables in the predictors</span>: suppose we have both <span style="color:red">sell_price and product_price </span>in the model, and we know product_price is highly correlated to sell_price. An example is shown below in example 1.</span></p> <p class="MsoNormal"><span style="font-size:16.0pt;font-family:"Garamond","serif""> </span></p> <p class="MsoNormal"><span style="font-size:16.0pt;font-family:"Garamond","serif"">Another possible reason is we <span style="color:red">recode the missing data</span>: if missing data is at the left, but we recode it to the right(like if x<0 we recode x=99999 which is a very big number).</span></p> <p class="MsoNormal"><span style="font-size:16.0pt;font-family:"Garamond","serif""> </span></p> <p class="MsoNormal"><b><i><span style="font-size:20.0pt;font-family:"Garamond","serif";color:blue">Example 1:</span></i></b></p> <pre><i><span style="font-size:16.0pt;color:#666666"># set up random number seed</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"><a href="http://inside-r.org/r-doc/base/set.seed" target="_blank"><b><span style="color:#003399">set.seed</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#cc66cc">1000</span><span style="font-size:16.0pt;color:#009900">)</span><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><i><span style="font-size:16.0pt;color:#666666"># generate 100 x with x ~ N(5,1)</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222">x=<a href="http://inside-r.org/r-doc/stats/rnorm" target="_blank"><b><span style="color:#003399">rnorm</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#cc66cc">100</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#cc66cc">5</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#cc66cc">1</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-OukNRavAa5kjBxfB5FE5TFq2cnB1Pn-WYi2xu1bMjnDbAbZ2qy5fs8nrrbElfly46vpbUKm3FJqGaMuCztFpoE8wP3robddgf90V2dfOnoqDRUKZj9Hak0ZK-o2WBNOx7FnN7FjMqTI/s1600/image001-785939.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-OukNRavAa5kjBxfB5FE5TFq2cnB1Pn-WYi2xu1bMjnDbAbZ2qy5fs8nrrbElfly46vpbUKm3FJqGaMuCztFpoE8wP3robddgf90V2dfOnoqDRUKZj9Hak0ZK-o2WBNOx7FnN7FjMqTI/s320/image001-785939.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065083323258754" /></a></span><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><i><span style="font-size:16.0pt;color:#666666"># Y=5*X</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222">y=</span><span style="font-size:16.0pt;color:#cc66cc">5</span><span style="font-size:16.0pt;color:#222222">*x+<a href="http://inside-r.org/r-doc/stats/rnorm" target="_blank"><b><span style="color:#003399">rnorm</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#cc66cc">100</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#cc66cc">0</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#cc66cc">1</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfGCrte-ZhskF7SXhIECYAHFZwgs7bqjfvwZtQdRmotQTXx6DNHMzwCVxh9H1_cuPkdNxLVdVnUdbrx5yNJiDY3KmEQxgX2h3OT5lFJGwiO14NpebIwizo6sLb2bcnoMlTRxDsZzAxsX4/s1600/image002-787313.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfGCrte-ZhskF7SXhIECYAHFZwgs7bqjfvwZtQdRmotQTXx6DNHMzwCVxh9H1_cuPkdNxLVdVnUdbrx5yNJiDY3KmEQxgX2h3OT5lFJGwiO14NpebIwizo6sLb2bcnoMlTRxDsZzAxsX4/s320/image002-787313.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065085592603426" /></a></span><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><i><span style="font-size:16.0pt;color:#666666"># X1=100*X</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222">x1=x*</span><span style="font-size:16.0pt;color:#cc66cc">100</span><span style="font-size:16.0pt;color:#222222">+<a href="http://inside-r.org/r-doc/stats/rnorm" target="_blank"><b><span style="color:#003399">rnorm</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#cc66cc">100</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#cc66cc">0</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#cc66cc">.2</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMa6c0jgYQD2EpWaQaAvcTl3VcTXiszvXYWvhJNj7qBQmYYkxAU5CMP8v7gguS7qhPS-fEeRuVyLb0WHUn3M0GMZSnC3mj_u-llAeitorlQeazJSjcLVH6Oa-vavjoqYyACmK8-N4u8c8/s1600/image003-789116.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMa6c0jgYQD2EpWaQaAvcTl3VcTXiszvXYWvhJNj7qBQmYYkxAU5CMP8v7gguS7qhPS-fEeRuVyLb0WHUn3M0GMZSnC3mj_u-llAeitorlQeazJSjcLVH6Oa-vavjoqYyACmK8-N4u8c8/s320/image003-789116.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065095571458946" /></a></span><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt;color:red">1)it is reasonable that the regression coefficient is 4.947(while the true value is 5). </span></pre> <pre><i><span style="font-size:16.0pt;color:#666666">#relation between x and y</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"><a href="http://inside-r.org/r-doc/stats/lm" target="_blank"><b><span style="color:#003399">lm</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#222222">y~x</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAoe8AXMXZ2O-qX8tl2RuWF6_hnxdMfrotdARoQp7GhWUAygPv4kwWk0m9DHjMrhyphenhyphenNWZchksV0FFakI9m80cjjpoYd4e4Akj0g7Rn8wer5ewgQslpWBgYwIaCWgqdsI1o_G5n3mc99W0Y/s1600/image004-790739.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAoe8AXMXZ2O-qX8tl2RuWF6_hnxdMfrotdARoQp7GhWUAygPv4kwWk0m9DHjMrhyphenhyphenNWZchksV0FFakI9m80cjjpoYd4e4Akj0g7Rn8wer5ewgQslpWBgYwIaCWgqdsI1o_G5n3mc99W0Y/s320/image004-790739.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065099939345922" /></a></span><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt;color:red">2) because x1=100*x, the coefficient for x1 should be 1/100 of above, as below:</span></pre> <pre><i><span style="font-size:16.0pt;color:#666666"># relation between x1 and y</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"><a href="http://inside-r.org/r-doc/stats/lm" target="_blank"><b><span style="color:#003399">lm</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#222222">y~x1</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0HkjaZeziWsEZhdvECciTl3M_dsGqjzDnxFZrcexFUNQQ2rDP4eWgGhxMSqG6r5iP58WIWtzAfpB0JdBBWMAZ1Wh9vUUHovkiGUk0uYFQ6lUJnYP_jgYgkJMOqHLBz2kIIokRnVvfzms/s1600/image005-791956.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0HkjaZeziWsEZhdvECciTl3M_dsGqjzDnxFZrcexFUNQQ2rDP4eWgGhxMSqG6r5iP58WIWtzAfpB0JdBBWMAZ1Wh9vUUHovkiGUk0uYFQ6lUJnYP_jgYgkJMOqHLBz2kIIokRnVvfzms/s320/image005-791956.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065110261026482" /></a></span><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt;color:red">3) but if we put both x and x1 in the model, then we could not get the positive coefficient for both x and x1. This is because of multicollinearity. </span></pre> <pre><i><span style="font-size:16.0pt;color:#666666"># put both x and x1 as the predictor</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"><a href="http://inside-r.org/r-doc/stats/lm" target="_blank"><b><span style="color:#003399">lm</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#222222">y~x+x1</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#009900"> </span></pre> <pre><span style="font-size:16.0pt"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFORbDVsR_KzNWFPDoXxGfl4_79NBo7xmfybXX4vNkFb8Xagmw2w-bwpLOKiIwtQ0UGHZvJA_Gh7kVdrLEZAJQV6LsgnmdRVZvbQASOsP41ZLDqU6M0Nsc-MJkbp6XYw6We1irEu76Ads/s1600/image006-794109.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFORbDVsR_KzNWFPDoXxGfl4_79NBo7xmfybXX4vNkFb8Xagmw2w-bwpLOKiIwtQ0UGHZvJA_Gh7kVdrLEZAJQV6LsgnmdRVZvbQASOsP41ZLDqU6M0Nsc-MJkbp6XYw6We1irEu76Ads/s320/image006-794109.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065118753746546" /></a></span><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <p class="MsoNormal"><b><span style="font-size:20.0pt;font-family:"Garamond","serif";color:blue">Example 2:</span></b></p> <pre><span style="font-size:16.0pt;color:red">4) another condition is from we recode missing data. Suppose if x<=4 is missing, and we recode missing as 99999.</span></pre> <pre><i><span style="font-size:16.0pt;color:#666666"># if x<=4, treat it as 99999</span></i><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222">x3=<a href="http://inside-r.org/r-doc/base/ifelse" target="_blank"><b><span style="color:#003399">ifelse</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#222222">x></span><span style="font-size:16.0pt;color:#cc66cc">4</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#222222">x</span><span style="font-size:16.0pt;color:#339933">,</span><span style="font-size:16.0pt;color:#cc66cc">99999</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1aAVxYGtWzP5JJUt7FNI-Lnl1UgWmjtn8lIagltDuq7WcqhaklzlzZ-FaQ8Ruuq9x4VuQg4OkH13rqNVhyBowz3XS5sOnsuhoWkIeDnV_8m0pd0XCTNQpjjt1cY0JTzAywi8RffaCi2M/s1600/image007-795596.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1aAVxYGtWzP5JJUt7FNI-Lnl1UgWmjtn8lIagltDuq7WcqhaklzlzZ-FaQ8Ruuq9x4VuQg4OkH13rqNVhyBowz3XS5sOnsuhoWkIeDnV_8m0pd0XCTNQpjjt1cY0JTzAywi8RffaCi2M/s320/image007-795596.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065122557406178" /></a></span><span style="font-size:16.0pt;color:#222222"></span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <pre><span style="font-size:16.0pt;color:red">5) Then the regression coefficient is negative because of the 99999.</span></pre> <pre><i><span style="font-size:16.0pt;color:#666666"># relation between x3 and y</span></i></pre> <pre><span style="font-size:16.0pt;color:#222222"><a href="http://inside-r.org/r-doc/stats/lm" target="_blank"><b><span style="color:#003399">lm</span></b></a></span><span style="font-size:16.0pt;color:#009900">(</span><span style="font-size:16.0pt;color:#222222">y~x3</span><span style="font-size:16.0pt;color:#009900">)</span></pre> <pre><span style="font-size:16.0pt;color:#222222"> </span></pre> <p class="MsoNormal"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiigmWE6HI4JuNcxpAKdDL7qFzLTGboHrEQsjtmxlIdAjUWi9p0JDvkvR5dJqY9iwbtVOyAgi__mMSwr7jCl-X8-zmAukC_u7cIaWQhd0z3SVttjNscLFw-2NZzO_fqCw1Xs7APyukkpw/s1600/image009-797273.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiigmWE6HI4JuNcxpAKdDL7qFzLTGboHrEQsjtmxlIdAjUWi9p0JDvkvR5dJqY9iwbtVOyAgi__mMSwr7jCl-X8-zmAukC_u7cIaWQhd0z3SVttjNscLFw-2NZzO_fqCw1Xs7APyukkpw/s320/image009-797273.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5862065129651320082" /></a><span style="font-size:16.0pt"></span></p> <p class="MsoNormal"><span style="font-size:16.0pt"> </span></p><p class="MsoNormal" style><span style="font-size:16.0pt">The recode issue can be treated as a special case of non-linear relation.</span></p></div></div></div> </div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com1tag:blogger.com,1999:blog-3207512912949101943.post-57939565992251016942013-03-28T18:16:00.001-07:002013-03-28T18:16:42.669-07:00group obs almost evenly and calculate cumulative stats in SAS and R<div dir="ltr"><b><span style="font-family:'Courier New';color:navy;background-repeat:initial initial">The input is 99 records, the requirement is to split it into 10 groups as evenly as possible(10 records each for first 9 groups, and 9 records in the last group). And then get the cumulative sum/mean in each group.</span></b><br><div class="gmail_quote"><div lang="EN-US" link="blue" vlink="purple"><div> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white"> </span></b></p> <p class="MsoNormal" style="text-autospace:none"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1oVZRODSBTJzozjb5Kl6xH9sxkahambGoCJJwPtmNTQzjcU0LRpWzDYqiBVEDGHJ3EPsoKBQ3_pliPRMaKS1HbIt_BDsRh-LAO1uwKIU9INLWHo1OajrQZvKF1JSIHWbyPRwHIu1CIrY/s1600/image001-702670.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1oVZRODSBTJzozjb5Kl6xH9sxkahambGoCJJwPtmNTQzjcU0LRpWzDYqiBVEDGHJ3EPsoKBQ3_pliPRMaKS1HbIt_BDsRh-LAO1uwKIU9INLWHo1OajrQZvKF1JSIHWbyPRwHIu1CIrY/s320/image001-702670.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5860567932841130402" /></a><b><span style="font-family:"Courier New";color:navy;background:white"></span></b></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white"> </span></b></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">The output is like:</span></b></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white"> </span></b></p> <p class="MsoNormal" style="text-autospace:none"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoJaxSzH6_A50dE7DFm1d6ejZzYuSSmSG8-Td7dJgZlGAouDuIfo2uFdwnH_2WGNS5aBf1OUt56cG9FVLw6Tp1_CrdgZ74zn-1bsP5hS57NS05hBjzXj6eUu-YznmsmKgV-z32-bJxAqQ/s1600/image002-704831.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoJaxSzH6_A50dE7DFm1d6ejZzYuSSmSG8-Td7dJgZlGAouDuIfo2uFdwnH_2WGNS5aBf1OUt56cG9FVLw6Tp1_CrdgZ74zn-1bsP5hS57NS05hBjzXj6eUu-YznmsmKgV-z32-bJxAqQ/s320/image002-704831.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5860567937076512434" /></a><b><span style="font-family:"Courier New";color:navy;background:white"></span></b></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white"> </span></b></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">In SAS, a loop is required to do this because of cumulative sum. </span></b></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white"> </span></b></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">data</span></b><span style="background:white;font-family:"Courier New""> test;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> i=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> p=</span><b><span style="font-family:"Courier New";color:teal;background:white">.99</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">output</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">do</span><span style="background:white;font-family:"Courier New""> i=</span><b><span style="font-family:"Courier New";color:teal;background:white">2</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">to</span><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:teal;background:white">98</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> p=</span><b><span style="font-family:"Courier New";color:teal;background:white">.8</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">output</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> i=</span><b><span style="font-family:"Courier New";color:teal;background:white">99</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> p=</span><b><span style="font-family:"Courier New";color:teal;background:white">.2</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">output</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">print</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">data</span><span style="background:white;font-family:"Courier New"">=test;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* if use proc rank, it cannot group data evenly because of ties;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">rank</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">data</span><span style="background:white;font-family:"Courier New"">=test </span><span style="font-family:"Courier New";color:blue;background:white">out</span><span style="background:white;font-family:"Courier New"">=t group=</span><b><span style="font-family:"Courier New";color:teal;background:white">10</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">var</span><span style="background:white;font-family:"Courier New""> p;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">ranks</span><span style="background:white;font-family:"Courier New""> rank;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">print</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">data</span><span style="background:white;font-family:"Courier New"">=t;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">; </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* so need to mannuly do it;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> dsid=</span><span style="font-family:"Courier New";color:blue;background:white">%sysfunc</span><span style="background:white;font-family:"Courier New"">(open(test)); </span><span style="font-family:"Courier New";color:green;background:white">*open the file;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> nobs=</span><span style="font-family:"Courier New";color:blue;background:white">%sysfunc</span><span style="background:white;font-family:"Courier New"">(attrn(&dsid,nobs)); </span><span style="font-family:"Courier New";color:green;background:white">*count the obs in file;</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> ngroup=10; </span><span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> overall_pct=.5; </span><span style="font-family:"Courier New";color:blue;background:white">%put</span><span style="background:white;font-family:"Courier New""> &nobs;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* data n_per_group only has one obs;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">data</span></b><span style="background:white;font-family:"Courier New""> n_per_group;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> n_per_grp=int(&nobs/&</span><span style="font-family:"Courier New";color:teal;background:white">ngroup.</span><span style="background:white;font-family:"Courier New"">); </span><span style="font-family:"Courier New";color:green;background:white">* get quotient;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> remainder=mod(&nobs,&</span><span style="font-family:"Courier New";color:teal;background:white">ngroup.</span><span style="background:white;font-family:"Courier New"">); </span><span style="font-family:"Courier New";color:green;background:white">* get remainder;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">array</span><span style="background:white;font-family:"Courier New""> ps {&ngroup} ps1-ps&ngroup;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">keep</span><span style="background:white;font-family:"Courier New""> ps1-ps&ngroup;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">do</span><span style="background:white;font-family:"Courier New""> i=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">to</span><span style="background:white;font-family:"Courier New""> &ngroup;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">if</span><span style="background:white;font-family:"Courier New""> remainder></span><b><span style="font-family:"Courier New";color:teal;background:white">0</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> ps{i}=n_per_grp+</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> remainder=remainder-</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">else</span><span style="background:white;font-family:"Courier New""> ps{i}=n_per_grp;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">output</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">print</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">data</span><span style="background:white;font-family:"Courier New"">=n_per_group;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* read in the only one obs, and keep it in PVM until the end by using if _n_=1 then do statement;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">data</span></b><span style="background:white;font-family:"Courier New""> out(</span><span style="font-family:"Courier New";color:blue;background:white">drop</span><span style="background:white;font-family:"Courier New"">=freq _count_ i p);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">if</span><span style="background:white;font-family:"Courier New""> _n_=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">set</span><span style="background:white;font-family:"Courier New""> n_per_group;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> index=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">retain</span><span style="background:white;font-family:"Courier New""> freq _count_ </span><b><span style="font-family:"Courier New";color:teal;background:white">0</span></b><span style="background:white;font-family:"Courier New""> index ;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">array</span><span style="background:white;font-family:"Courier New""> ps(&ngroup) ps1-ps&ngroup;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">set</span><span style="background:white;font-family:"Courier New""> test </span><span style="font-family:"Courier New";color:blue;background:white">end</span><span style="background:white;font-family:"Courier New"">=last;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:green;background:white">* a liitle tricky: keep on adding p together unitl the # of added obs = n_per_group as expected;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:green;background:white">* if the # of added obs = n_per_group, calculate the stats we want, otherwise, keep on adding;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">if</span><span style="background:white;font-family:"Courier New""> _count_=ps(index) </span><span style="font-family:"Courier New";color:blue;background:white">then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> num_obs=ps(index);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> avg_pred_p=sum_p/num_obs;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> lift=avg_pred_p/&overall_pct;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">output</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> index+</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> _count_=</span><b><span style="font-family:"Courier New";color:teal;background:white">0</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> sum_p=</span><b><span style="font-family:"Courier New";color:teal;background:white">0</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> sum_p+p; </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> _count_+</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">if</span><span style="background:white;font-family:"Courier New""> last </span><span style="font-family:"Courier New";color:blue;background:white">then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> num_obs=ps(index);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> avg_pred_p=sum_p/num_obs;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> lift=avg_pred_p/&overall_pct;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">output</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">print</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">data</span><span style="background:white;font-family:"Courier New"">=out;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal"><span style="font-size:16.0pt;font-family:"Garamond","serif""> </span></p> <p class="MsoNormal"><span style="font-size:16.0pt;font-family:"Garamond","serif""> </span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New";color:#00a84c">## It is very easy to do this in R</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New";color:#00a84c">## a simple way</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New";color:blue">rm(list=ls())</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New"">x=<span style="color:blue">c</span>(.9,<span style="color:blue">rep</span>(.8,97),.2)</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New"">ngrp=10</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New"">nobs=<span style="color:blue">rep</span>(<span style="color:blue">length</span>(x)<span style="color:blue">%/%</span>ngrp, ngrp)+<span style="color:blue">c</span>(<span style="color:blue">rep</span>(1,<span style="color:blue">length</span>(x)<span style="color:blue">%%</span>ngrp), <span style="color:blue">rep</span>(0,ngrp-<span style="color:blue">length</span>(x)<span style="color:blue">%%</span>ngrp))</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New"">levl=<span style="color:blue">rep</span>(1:ngrp, nobs)</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New"">df=<span style="color:blue">data.frame</span>(<span style="color:blue">cbind</span>(x,levl))</span></p> <p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Courier New";color:blue">aggregate</span><span style="font-size:12.0pt;font-family:"Courier New"">(x~levl, df, mean)</span></p> </div> </div></div><br></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com6tag:blogger.com,1999:blog-3207512912949101943.post-26895315627067290702013-03-12T13:16:00.001-07:002013-03-12T13:16:55.197-07:00examples of sas proc sql connect to db with execute<div dir="ltr"><span style="color:green;font-family:'Courier New'">*** sas proc sql connect to db with execute;</span><br><div class="gmail_quote"><div lang="EN-US" link="blue" vlink="purple"><p class="MsoNormal" style="text-autospace:none"> <span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* in sas to run proc sql contains a query passedthrough sas to sas sql processor;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">*SAS;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">sql</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:blue;background:white">select</span><span style="background:white;font-family:"Courier New""> employee_title </span><span style="font-family:"Courier New";color:blue;background:white">as</span><span style="background:white;font-family:"Courier New""> title, avg(employee_years), </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> freq(employee_id)</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">from</span><span style="background:white;font-family:"Courier New""> sql.employee</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">group</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> title</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">order</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> title;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* Query passed through;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:blue;background:white">select</span><span style="background:white;font-family:"Courier New""> * </span><span style="font-family:"Courier New";color:blue;background:white">from</span><span style="background:white;font-family:"Courier New""> connection to remote</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> (</span><span style="font-family:"Courier New";color:blue;background:white">select</span><span style="background:white;font-family:"Courier New""> employee_title </span><span style="font-family:"Courier New";color:blue;background:white">as</span><span style="background:white;font-family:"Courier New""> title, </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> avg(employee_years), </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> freq(employee_id)</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">from</span><span style="background:white;font-family:"Courier New""> sql.employee</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">group</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> title</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">order</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> title);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* drop table, create table and insert an obs into table;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">sql</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">execute</span><span style="background:white;font-family:"Courier New"">(drop table </span><span style="font-family:"Courier New";color:purple;background:white">' My Invoice '</span><span style="background:white;font-family:"Courier New"">) </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> db;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">execute</span><span style="background:white;font-family:"Courier New"">(create table </span><span style="font-family:"Courier New";color:purple;background:white">' My Invoice '</span><span style="background:white;font-family:"Courier New"">(</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:purple;background:white">' Invoice Number '</span><span style="background:white;font-family:"Courier New""> LONG not null,</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:purple;background:white">' Billed To '</span><span style="background:white;font-family:"Courier New""> VARCHAR(</span><b><span style="font-family:"Courier New";color:teal;background:white">20</span></b><span style="background:white;font-family:"Courier New"">),</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:purple;background:white">' Amount '</span><span style="background:white;font-family:"Courier New""> CURRENCY,</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:purple;background:white">' BILLED ON '</span><span style="background:white;font-family:"Courier New""> DATETIME)) </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> db;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">execute</span><span style="background:white;font-family:"Courier New"">(insert into </span><span style="font-family:"Courier New";color:purple;background:white">' My Invoice '</span><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> values( </span><b><span style="font-family:"Courier New";color:teal;background:white">12345</span></b><span style="background:white;font-family:"Courier New"">, </span><span style="font-family:"Courier New";color:purple;background:white">'John Doe'</span><span style="background:white;font-family:"Courier New"">, </span><b><span style="font-family:"Courier New";color:teal;background:white">123.45</span></b><span style="background:white;font-family:"Courier New"">, #</span><b><span style="font-family:"Courier New";color:teal;background:white">11</span></b><span style="background:white;font-family:"Courier New"">/</span><b><span style="font-family:"Courier New";color:teal;background:white">22</span></b><span style="background:white;font-family:"Courier New"">/</span><b><span style="font-family:"Courier New";color:teal;background:white">2003</span></b><span style="background:white;font-family:"Courier New"">#)) </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> db;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">quit</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* insert into table base from another table named one;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">sql</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:blue;background:white">insert</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">into</span><span style="background:white;font-family:"Courier New""> base ( source, a, b, d )</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">select source, a, b, </span><span style="font-family:"Courier New";color:purple;background:white">' '</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> from one;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">quit</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* join a db table facebook_ads with another table sasprod.fb_ad_fromsas by ad_id;</span><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">sql</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">connect</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">to</span><span style="background:white;font-family:"Courier New""> oracle (user=sas password=sas path=db1);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">execute</span><span style="background:white;font-family:"Courier New""> (update db1.facebook_ads m set</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> (m.rank, m.ad_work_item_id)=</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> (select a.rank, a.ad_work_item_id</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> from sasprod.fb_ad_fromsas a</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> where m.ad_wid=a.ad_id)</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> where exists (select </span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New""> from sasprod.fb_ad_fromsas a where m.ad_id=a.ad_id)</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> )</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> oracle;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">disconnect</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">from</span><span style="background:white;font-family:"Courier New""> oracle;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">quit</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* update mysql db;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* update the conversion_rate and updated_at in table projected_conversion_rates on mysql db named reporting_production by the values in sasdb.projected_merchant_conv_sas on equalling of payer_id, event_id and cutoff values;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">sql</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">connect</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">to</span><span style="background:white;font-family:"Courier New""> mysql (user=xxx password=xxx database=reporting_production server=</span><span style="font-family:"Courier New";color:purple;background:white">"server_m1"</span><span style="background:white;font-family:"Courier New""> port=</span><b><span style="font-family:"Courier New";color:teal;background:white">8888</span></b><span style="background:white;font-family:"Courier New"">);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">execute</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> (update projected_conversion_rates pcr inner join sasdb.projected_merchant_conv_sas a on (pcr.payer_id=a.payer_id and pcr.event_id=a.event_id and pcr.cutoff_min=a.cutoff_min and pcr.cutoff_max=a.cutoff_max)</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> set pcr.conversion_rate=a.conversion_rate, pcr.updated_at=a.updated_at)</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">by</span><span style="background:white;font-family:"Courier New""> mysql;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">disconnect</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">from</span><span style="background:white;font-family:"Courier New""> mysql;</span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">quit</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:blue;background:white">libname</span><span style="background:white;font-family:"Courier New""> mylib </span><span style="font-family:"Courier New";color:purple;background:white">'c:\sales'</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">* another example to join db table with client table;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">sql</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">connect</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">to</span><span style="background:white;font-family:"Courier New""> remote </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> (server=tso.shr1 dbms=db2 </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> dbmsarg=(ssid=db2p));</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">select</span><span style="background:white;font-family:"Courier New""> * </span><span style="font-family:"Courier New";color:blue;background:white">from</span><span style="background:white;font-family:"Courier New""> mylib.sales08, </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> connection to remote </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> (</span><span style="font-family:"Courier New";color:blue;background:white">select</span><span style="background:white;font-family:"Courier New""> qtr, division, </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> sales, pct</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">from</span><span style="background:white;font-family:"Courier New""> revenue.all08</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">where</span><span style="background:white;font-family:"Courier New""> region=</span><span style="font-family:"Courier New";color:purple;background:white">'Southeast'</span><span style="background:white;font-family:"Courier New"">)</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">where</span><span style="background:white;font-family:"Courier New""> sales08.div=division;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">*** docs: <a href="http://support.sas.com/documentation/cdl/en/connref/61908/HTML/default/viewer.htm#srspt.htm" target="_blank">http://support.sas.com/documentation/cdl/en/connref/61908/HTML/default/viewer.htm#srspt.htm</a>;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal"><br></p></div></div></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com3tag:blogger.com,1999:blog-3207512912949101943.post-6461317606905124322013-03-05T19:08:00.001-08:002013-03-05T19:08:50.655-08:00Compare the variables distribution of two data sets<div dir="ltr"><span style="color:green;font-family:'Courier New'">*** The purpose here is to compare the variables distribution of two data sets ;</span><br><div class="gmail_quote"><div lang="EN-US" link="blue" vlink="purple"> <div><p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:green;background:white">*** For example, if we use January data to build model and to score Feburary data ;</span><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:green;background:white">*** then we need to make sure the distribution of predictors are the same ;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white"> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> changed=0;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <b><span style="font-family:"Courier New";color:navy;background:white">%macro</span></b><span style="background:white;font-family:"Courier New""> chisq(vars, change_level_vars, timeframe);</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New"">data _null_;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> array a_var(*) &vars;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> call symput(</span><span style="font-family:"Courier New";color:purple;background:white">'nvar'</span><span style="background:white;font-family:"Courier New"">, ktrim(kleft(put(dim(a_var),</span><b><span style="font-family:"Courier New";color:teal;background:white">8.</span></b><span style="background:white;font-family:"Courier New"">))));</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New"">run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New""> i=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%to</span><span style="background:white;font-family:"Courier New""> &nvar;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span> <span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> var=</span><span style="font-family:"Courier New";color:blue;background:white">%scan</span><span style="background:white;font-family:"Courier New"">(&vars, &i);</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span> </p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span> <span style="font-family:"Courier New";color:blue;background:white">%if</span><span style="background:white;font-family:"Courier New""> &var = &change_level_vars </span><span style="font-family:"Courier New";color:blue;background:white">%then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc freq data=forchisq;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> tables &var * &timeframe / out=comp_&var (drop=percent);</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> data comp_&var;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> merge comp_&var(in=in1 where=(&timeframe=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">) rename=(count=old)) comp_&var(in=in2 where=(&timeframe=</span><b><span style="font-family:"Courier New";color:teal;background:white">2</span></b><span style="background:white;font-family:"Courier New"">) rename=(count=new)); </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> by &var;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> drop timeframe;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> if in1 and in2 then delete;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc sql;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> select count(*) into: changed from comp_&var;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> quit;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%if</span><span style="background:white;font-family:"Courier New""> &changed > </span><b><span style="font-family:"Courier New";color:teal;background:white">0</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc printto file=toemail;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> title </span><span style="font-family:"Courier New";color:purple;background:white">"New or deleted levels for &var"</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc print data=comp_&var width=min;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc printto;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span> <span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span> <span style="font-family:"Courier New";color:blue;background:white">%else</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc sql;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> select count(distinct &var) into: m_new_&var from forchisq where &timeframe=</span><b><span style="font-family:"Courier New";color:teal;background:white">2</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> select count(distinct &var) into: m_old_&var from forchisq where &timeframe=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> quit;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%if</span><span style="background:white;font-family:"Courier New""> &&m_new_&var ne &&m_old_&var </span><span style="font-family:"Courier New";color:blue;background:white">%then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> changed=1;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc printto file=toemail;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> title </span><span style="font-family:"Courier New";color:purple;background:white">"&var has different values of levels:"</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc freq data=forchisq;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> tables &var * &timeframe / missing nocum nopercent;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc printto;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%else</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc freq data=forchisq(where=(&timeframe=</span><b><span style="font-family:"Courier New";color:teal;background:white">1</span></b><span style="background:white;font-family:"Courier New"">));</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> tables &var / missing out=old_dist;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc sql;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> select percent into: percents separated by </span><span style="font-family:"Courier New";color:purple;background:white">" "</span><span style="background:white;font-family:"Courier New""> from old_dist;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> quit;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc freq data=forchisq(where=(&timeframe=</span><b><span style="font-family:"Courier New";color:teal;background:white">2</span></b><span style="background:white;font-family:"Courier New"">));</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:green;background:white">*weight &wt;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> table &var / missing chisq testp=(&percents);</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> output out=chisq_&var(rename=(_PCHI_=Chisq P_PCHI=p_value)) chisq pchi;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc print data=chisq_&var width=min;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> title </span><span style="font-family:"Courier New";color:purple;background:white">"Chisq values for &var"</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc sql;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> select round(p_value, </span><b><span style="font-family:"Courier New";color:teal;background:white">0.001</span></b><span style="background:white;font-family:"Courier New"">) into: pvalue from chisq_&var;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> quit;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%if</span><span style="background:white;font-family:"Courier New""> &pvalue<</span><b><span style="font-family:"Courier New";color:teal;background:white">0.01</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> changed=1;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc printto file=toemail;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc freq data=forchisq;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> title </span><span style="font-family:"Courier New";color:purple;background:white">"&var distribution has changed."</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> table &var * &timeframe / missing norow nocum nopct;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> proc printto;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> run;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span> <span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white">%if</span><span style="background:white;font-family:"Courier New""> &changed></span><b><span style="font-family:"Courier New";color:teal;background:white">0</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%then</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">%do</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> x cat toemail.lst | mail -s </span><span style="font-family:"Courier New";color:purple;background:white">"Data distribution change report"</span><span style="background:white;font-family:"Courier New""> &notify; </span><span style="font-family:"Courier New";color:blue;background:white">%end</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <b><span style="font-family:"Courier New";color:navy;background:white">%mend</span></b><span style="background:white;font-family:"Courier New""> chisq;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:green;background:white">*** test the macro by the Titanic Train data set from kaggle;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:green;background:white">*** The first 448 obs were set as timefram=1, the rest is 2;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">import</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">datafile</span><span style="background:white;font-family:"Courier New"">=</span><span style="font-family:"Courier New";color:purple;background:white">"/home/test/train.csv"</span><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">out</span><span style="background:white;font-family:"Courier New"">=forchisq(</span><span style="font-family:"Courier New";color:blue;background:white">rename</span><span style="background:white;font-family:"Courier New"">=(time=timeframe)) </span><span style="font-family:"Courier New";color:blue;background:white">replace</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span> <span style="font-family:"Courier New";color:blue;background:white">getnames</span><span style="background:white;font-family:"Courier New"">=yes;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <b><span style="font-family:"Courier New";color:navy;background:white">proc</span></b><span style="background:white;font-family:"Courier New""> </span><b><span style="font-family:"Courier New";color:navy;background:white">print</span></b><span style="background:white;font-family:"Courier New""> </span><span style="font-family:"Courier New";color:blue;background:white">data</span><span style="background:white;font-family:"Courier New"">=forchisq </span><span style="font-family:"Courier New";color:blue;background:white">width</span><span style="background:white;font-family:"Courier New"">=min;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <b><span style="font-family:"Courier New";color:navy;background:white">run</span></b><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> vars=survived pclass sex sibsp parch embarked;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> change_level_vars=cabin;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="font-family:"Courier New";color:blue;background:white">%let</span><span style="background:white;font-family:"Courier New""> notify=<a href="mailto:hsong@nextag.com" target="_blank">hsong@nextag.com</a>;</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New"">%<b><i>chisq</i></b>(&vars, &change_level_vars, timeframe);</span></p> <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal;text-autospace:none"> <span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal"><span style="font-family:"Courier New"">The result is:</span></p> <p class="MsoNormal"><span style="font-family:"Courier New"">1)For survived, <span style="background:silver"> we cannot reject that the distribution of the two data sets are the same since p-value is .3299</span>.</span></p> <p><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhho8uiyZC5PAlyQAel0GfVuoxSK4_weCijNRlTl6pi3_SogpVGsEZrNj2qNRldtT5_J59KL6Dx68lsUcMHtvgESxd61Z_iHVMwBVvBXf41etk9ddw0ggn2oToknFVOkCcHjL2vEIFqU0Y/s1600/image001-730656.jpg"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhho8uiyZC5PAlyQAel0GfVuoxSK4_weCijNRlTl6pi3_SogpVGsEZrNj2qNRldtT5_J59KL6Dx68lsUcMHtvgESxd61Z_iHVMwBVvBXf41etk9ddw0ggn2oToknFVOkCcHjL2vEIFqU0Y/s320/image001-730656.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5852061862663704018" /></a><span style="font-family:"Courier New""></span></p> <p class="MsoNormal"><span style="font-family:"Courier New"">2) For pclass, the result is:</span></p> <p class="MsoNormal"><span style="font-family:"Courier New""> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnvrEdbOVWDGNAcloo12qXATMa24PhghC_QiMpn6U6TtOIydAefKxsbRefpTlJnpFkVLarg9HYpS7WM7Uq1xVFJjCv-s_vUhG1ntqisEV6GSZfn-ze1S15AhzcpHgg8PCZJImeGti7Aos/s1600/image002-733713.jpg"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnvrEdbOVWDGNAcloo12qXATMa24PhghC_QiMpn6U6TtOIydAefKxsbRefpTlJnpFkVLarg9HYpS7WM7Uq1xVFJjCv-s_vUhG1ntqisEV6GSZfn-ze1S15AhzcpHgg8PCZJImeGti7Aos/s320/image002-733713.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5852061876852308162" /></a><span style="font-family:"Courier New""></span></p> <p class="MsoNormal"><span style="font-family:"Courier New"">3) For sex, <span style="background:silver"> the result shows p-value<.01</span> so we think the distribution of this variable from two data sets is different</span></p> <p class="MsoNormal"><span style="font-family:"Courier New""> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgi1uRtD3M9Rj7P8QTXRtOAVXOK9kVnn4U1onTDZjzhsqUFoyFtP7sFyjIDO7O5kXq-0hz_Ee_oyPwgb8qHOF5jg9B532XOVl8WFAKVGi-k5CEysI0JgXGHOeA9wmosUeBqY7QhJeJngPc/s1600/image003-734947.jpg"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgi1uRtD3M9Rj7P8QTXRtOAVXOK9kVnn4U1onTDZjzhsqUFoyFtP7sFyjIDO7O5kXq-0hz_Ee_oyPwgb8qHOF5jg9B532XOVl8WFAKVGi-k5CEysI0JgXGHOeA9wmosUeBqY7QhJeJngPc/s320/image003-734947.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5852061885564671554" /></a><span style="font-family:"Courier New""></span></p> <p class="MsoNormal"><span style="font-family:"Courier New"">4) For sibsp, the result is:</span></p> <p class="MsoNormal"><span style="font-family:"Courier New""> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEil3TRk4Ezq94u7c_Aa5R_RQEUg7RxCs7oZA32CtWGdblLxtDv_RH3vMJS45zXC2e2c2oWJsu7fNxOSyxTZVGQK49E6HZXmgB9g1Z0QM4NEaNR27hmzXzLVtuOO0Mc9-JPOJlYFVbgaSvM/s1600/image004-737127.jpg"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEil3TRk4Ezq94u7c_Aa5R_RQEUg7RxCs7oZA32CtWGdblLxtDv_RH3vMJS45zXC2e2c2oWJsu7fNxOSyxTZVGQK49E6HZXmgB9g1Z0QM4NEaNR27hmzXzLVtuOO0Mc9-JPOJlYFVbgaSvM/s320/image004-737127.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5852061894617680274" /></a><span style="font-family:"Courier New""></span></p> <p class="MsoNormal"><span style="font-family:"Courier New"">5) For parch, we will check if there is new levels or delted levels, there is new new level = 6 from the result:</span></p> <p class="MsoNormal"><span style="font-family:"Courier New""> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9FzLRuOCWtyXNKFrY8WfcLLWN2rlQ8_JIgXeEhC_2F_kfoyKkNPIYK4m1MCzrNXNiP57cVF-UdIkeca5n2e_6W2CbY9JOQOi1mg-_ktcyLYcRouHZ7O8CdnKj9N28Fw1tVPygQrFBBbk/s1600/image005-739047.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9FzLRuOCWtyXNKFrY8WfcLLWN2rlQ8_JIgXeEhC_2F_kfoyKkNPIYK4m1MCzrNXNiP57cVF-UdIkeca5n2e_6W2CbY9JOQOi1mg-_ktcyLYcRouHZ7O8CdnKj9N28Fw1tVPygQrFBBbk/s320/image005-739047.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5852061903435154642" /></a><span style="font-family:"Courier New""></span></p> <p class="MsoNormal"><span style="font-family:"Courier New"">6) Fot embarked, the result is:</span></p> <p class="MsoNormal"><span style="font-family:"Courier New""> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkDsW4oxuOnTKVRBZJcn06Zm4T7DyB9OIXYas3O1kv8bo3ollvb7Vy1qJ0ILAFXb7GJWjHAC7IJ6Ep5OveZE7Mbiag7jiWZY-pWfH5uf4knXQPcdzMWpCIZt8SDzDaYrUOMrBCsJyRZkc/s1600/image006-740814.jpg"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkDsW4oxuOnTKVRBZJcn06Zm4T7DyB9OIXYas3O1kv8bo3ollvb7Vy1qJ0ILAFXb7GJWjHAC7IJ6Ep5OveZE7Mbiag7jiWZY-pWfH5uf4knXQPcdzMWpCIZt8SDzDaYrUOMrBCsJyRZkc/s320/image006-740814.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5852061907137912450" /></a><span style="font-family:"Courier New""></span></p> <p class="MsoNormal"><span style="font-family:"Courier New""> </span></p></div></div></div></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-80192195391281932932013-02-01T17:42:00.001-08:002013-02-01T17:42:22.972-08:00Empirical logit plot between x and binary to check their linear relationship<div dir="ltr"><br><div class="gmail_quote"> <div lang="EN-US" link="blue" vlink="purple"> <div> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">** purpose: draw the expirical logit plot between x and binary y ;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:green;background:white">** to check if there is linear relation or not ;</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">%macro</span></b><span style="background:white;font-family:"Courier New""> empplot(indata, xvar, yvar);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">proc rank data=&indata groups=</span><b><span style="font-family:"Courier New";color:teal;background:white">100</span></b><span style="background:white;font-family:"Courier New""> out=out;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> var &xvar;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> ranks bin;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">run;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">proc means data=out noprint nway;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> class bin;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> var &yvar &xvar;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> output out=bins sum(&yvar)=&yvar mean(&xvar)=&xvar;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">run;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">data bins;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> set bins;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> elogit=log((&yvar+(sqrt(_freq_)/</span><b><span style="font-family:"Courier New";color:teal;background:white">2</span></b><span style="background:white;font-family:"Courier New"">))/(_freq_-&yvar+(sqrt(_freq_)/</span><b><span style="font-family:"Courier New";color:teal;background:white">2</span></b><span style="background:white;font-family:"Courier New"">)));</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">run;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">proc sgplot data=bins;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> reg y=elogit x=&xvar / curvelabel=</span><span style="font-family:"Courier New";color:purple;background:white">"Linear Relationship?"</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> curvelabelloc=outside</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> lineattrs=(color=ligr);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> series y=elogit x=&xvar;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> title </span><span style="font-family:"Courier New";color:purple;background:white">"Empirical Logit against &xvar"</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">run;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">proc sgplot data=bins;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> reg y=elogit x=bin /</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> curvelabel=</span><span style="font-family:"Courier New";color:purple;background:white">"Linear Relationship?"</span><span style="background:white;font-family:"Courier New""></span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> curvelabelloc=outside</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> lineattrs=(color=ligr);</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> series y=elogit x=bin;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> title </span><span style="font-family:"Courier New";color:purple;background:white">"Empirical Logit against Binned &xvar"</span><span style="background:white;font-family:"Courier New"">;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">run;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><b><span style="font-family:"Courier New";color:navy;background:white">%mend</span></b><span style="background:white;font-family:"Courier New""> empplot;</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New""> </span></p> <p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New";color:blue;background:white">An example: from previous univariate screening it is showing there is relatively strong relation between recent_view_count and action. So we draw their expirical logit as below:</span></p> <p class="MsoNormal" style="text-autospace:none"><span style="background:white;font-family:"Courier New"">%<b><i>empplot</i></b>(dyps.dyps_trainoversamp2, recentview_count, action);</span></p> <p class="MsoNormal"> </p> <p class="MsoNormal"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgADoTENOlKBnv1utl3U51uMD8rpG8anPthHYmU-CyZOg1YznqpA2M8eyYei6KKFHG-jJhPxnh6iySKqlTOuL2CqBngJiV7o3YFDXKSelPv9gQwNgiE7cGypjNSs33OZsCQhyphenhyphenq2efb0VTw/s1600/image001-742973.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgADoTENOlKBnv1utl3U51uMD8rpG8anPthHYmU-CyZOg1YznqpA2M8eyYei6KKFHG-jJhPxnh6iySKqlTOuL2CqBngJiV7o3YFDXKSelPv9gQwNgiE7cGypjNSs33OZsCQhyphenhyphenq2efb0VTw/s320/image001-742973.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5840164860928886370" /></a> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjErMAxBRV5HgJIEhzrMJugwKy9GpLiNMHgUeoBUG3mma1Ossze7SXpsEaohT-Kx_ZhrJ5-Uq4GzqZUdXbFMfSTqHNn6Nf4EgAyU1k3-WpCvXTKLlZJMrq1DAoyX5rT3irj90dFVrFBaH0/s1600/image002-745786.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjErMAxBRV5HgJIEhzrMJugwKy9GpLiNMHgUeoBUG3mma1Ossze7SXpsEaohT-Kx_ZhrJ5-Uq4GzqZUdXbFMfSTqHNn6Nf4EgAyU1k3-WpCvXTKLlZJMrq1DAoyX5rT3irj90dFVrFBaH0/s320/image002-745786.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5840164867598366338" /></a></p> <p class="MsoNormal"> </p> </div> </div></div><br></div> sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-62589634386913495492013-02-01T11:56:00.001-08:002013-02-01T11:56:59.587-08:00Check the correlation between Independent variables with Binary Dependent variable by Spearman and Hoeffding's D statistic<div dir="ltr">
<span style="font-family: Garamond,serif; font-size: 16pt;">1: How to check the correlation between Independent Variables(IV) and binary Dependent Variable(DV)? Spearman statistic and Hoeffding's D statistic is used. They are better than Pearson Chi-square statistic because they are less sensitive to outliers and nonlinearities. They calculate the correlation of rank of IV with DV. </span><span style="background-color: silver; font-family: Garamond,serif; font-size: 16pt;">Usually if Spearman should have similar monotonic trend with Hoeffding. Otherwise it means there is non-linear relation ship.</span><br />
<div class="gmail_quote">
<div lang="EN-US" link="blue" vlink="purple">
<div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: "Garamond","serif"; font-size: 16.0pt;">If both Spearman and Hoeffding gives low correlations, then we can drop those variables.</span><span style="font-family: "Garamond","serif"; font-size: 16.0pt;"> Like the price_change_pct, avg_rating, total_ratings and num_sellers in the example below.</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Garamond","serif"; font-size: 16.0pt;">The following macro is to test a group of IVs v.s. DV:</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: Consolas; font-size: 10.0pt;"><span style="background: silver;">*===============================================================================================;</span></span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">** use proc corr to examine the association between the inputs and the target var ;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">** Spearman corr is a corr of the ranks of the input var with the binary target, ;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">** it's less sensitive to nonlinearities and outliers then Pearson stats ;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">** Hoeffding's D statistic is also used to check the association ;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">** if spearman rank is high and hoeffding rank is low, then the association is not monotonic ;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">*===============================================================================================;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">** The output is a macro var containing the selected variables by univariate screening ;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">*===============================================================================================;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">%macro uniscreen(indata, varfile, target, pvalue);</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">filename varfile "&varfile";</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">** filename varfile "/home/hsong/varlist.txt";</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">data varall;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> infile varfile delimiter=',';</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> length varname $1000.;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> input varname $ @@;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc print data=varall width=min;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">title "print of varall";</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sql;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> select varname into: inputs separated by ' ' from varall;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> select count(*) into: nobs from varall;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">quit;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">%let nvar=%sysfunc(compress(&nobs));</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">ods html close;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">ods output spearmancorr=spearman hoeffdingcorr=hoeffding;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc corr data=&indata spearman hoeffding rank;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> var &inputs;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> with &target;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">ods html;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">data spearman1(keep=variable scorr spvalue ranksp);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> length variable $ 80.;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> set spearman;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> array best(*) best1--best&nvar;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> array r(*) r1--r&nvar;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> array p(*) p1--p&nvar;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> do i=1 to dim(best);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> variable=best(i);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> scorr=r(i);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> spvalue=p(i);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> ranksp=i;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> output;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> end;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">data hoeffding1(keep=variable hcorr hpvalue rankho);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> length variable $ 80.;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> set hoeffding;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> array best(*) best1--best&nvar;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> array r(*) r1--r&nvar;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> array p(*) p1--p&nvar;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> do i=1 to dim(best);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> variable=best(i);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> hcorr=r(i);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> hpvalue=p(i);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> rankho=i;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> output;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> end;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sort data=spearman1;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> by variable;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sort data=hoeffding1;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> by variable;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">data correlations;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> merge spearman1 hoeffding1;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> by variable;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sort data=correlations;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> by ranksp;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc print data=correlations width=min;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">title "print of correlations";</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sql noprint;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> select min(ranksp) into: vref</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> from (select ranksp</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> from correlations</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> having spvalue > &pvalue);</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> select min(rankho) into: href</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> from (select rankho</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> from correlations</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> having hpvalue > &pvalue);</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">quit;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sgplot data=correlations;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> refline &vref / axis=y;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> refline &href / axis=x;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> scatter y=ranksp x=rankho / datalabel=variable;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> yaxis label="Rank of Spearman";</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> xaxis label="Rank of Hoeffding";</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> title "Scatter Plot of the Ranks of Spearman vs Hoeffding";</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">run; </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sql;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> delete * from correlations</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> where ranksp>&vref and rankho>&href;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">quit;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">%global screened;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">proc sql;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;"> select trim(left(variable)) into: screened separated by ' ' from correlations;</span></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">quit;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">%put &screened;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">%mend;</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">libname dyps "/data02/temp/temp_hsong/product_banner";</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: silver; font-family: Consolas; font-size: 10.0pt;">%uniscreen(dyps.dyps_trainoversamp2, ./varfile.txt, action, .2);</span><span style="font-family: Consolas; font-size: 10.0pt;"></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Garamond","serif"; font-size: 16.0pt;">There are <span style="color: red;">four parameters </span>here: the first is the data set name, the second is the txt file containing all IVs, the third is the DV and the last is the criteria set up for p-value to discard non-significant vars. To avoid dropping too many vars, usually this criteria value is set up as .5, here we set as .2 in the example.</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="color: red; font-family: "Garamond","serif"; font-size: 16.0pt;">The output is: 1) a graph shows the rank of spearman and hoeffding rank. 2) macro variable screened which contains the picked variables.</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Garamond","serif"; font-size: 16.0pt;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglruao5GIFpF2gMJU9Oef43IRe4NXSENTF-G1SdQu2mgPaZH0Loc5Wi3G8MslJ8TNzZY4i7a7kwlMn-XQNJs7aeLHBZ6aGV-xviIo8_EAC1IsT7nvMXYXt1CnA8amyEweQspJye5zr4z0/s1600/image003-767056.jpg"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5840075630172568322" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglruao5GIFpF2gMJU9Oef43IRe4NXSENTF-G1SdQu2mgPaZH0Loc5Wi3G8MslJ8TNzZY4i7a7kwlMn-XQNJs7aeLHBZ6aGV-xviIo8_EAC1IsT7nvMXYXt1CnA8amyEweQspJye5zr4z0/s320/image003-767056.jpg" /></a></span><span style="font-family: "Garamond","serif"; font-size: 16.0pt;"></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Garamond","serif"; font-size: 16.0pt;">On the pic above, it means we can drop var price_change_pct, avg_rating, total_ratings and num_sellers. Let's have a look at these data(over 90% of price_change_pct is 0, and over 75% of avg_rating and total_ratings are 0.):</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 12.0pt;">price_change_pct avg_rating total_ratings </span></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6F2SkHYFr7MpFyib2JjVZbre5y7rE2W-9PsQ3loyX5JDSA5Dm0fHelLfI3NuAXZGkvSRAscYcbjfMX4QHZZfvNeh-3BSb2HeAvX1Wzg4F_O4ZTXUJuoWS8vfwB23nr_wdzVRqumzjBRs/s1600/image004-770146.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5840075639963073890" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6F2SkHYFr7MpFyib2JjVZbre5y7rE2W-9PsQ3loyX5JDSA5Dm0fHelLfI3NuAXZGkvSRAscYcbjfMX4QHZZfvNeh-3BSb2HeAvX1Wzg4F_O4ZTXUJuoWS8vfwB23nr_wdzVRqumzjBRs/s320/image004-770146.png" /></a><span style="font-family: "Garamond","serif"; font-size: 16.0pt;"> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTJ8Vvmyp8dG9F7eeNLGWWn3g32dmAQ66TAdyWYMc6M-VkOQpYPLhw0n4JN1PvHo-iEq-T0b0y_QTerjiQ28_POTbYOJcW_qCS0ANEvyBS-nrVoAb_xjP3qUNEHCw4TiYVer5oaNyU9bA/s1600/image005-773825.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5840075655507449010" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTJ8Vvmyp8dG9F7eeNLGWWn3g32dmAQ66TAdyWYMc6M-VkOQpYPLhw0n4JN1PvHo-iEq-T0b0y_QTerjiQ28_POTbYOJcW_qCS0ANEvyBS-nrVoAb_xjP3qUNEHCw4TiYVer5oaNyU9bA/s320/image005-773825.png" /></a><span style="font-family: "Garamond","serif"; font-size: 16.0pt;"> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNypOqVhdinQ5aLyk-VMpKRK7aXb4sZNhhQ30vf72vDpe5r7YKEIjke7ZCZsXPEYhIntCQYH5SlD23-WJBBBmCZZhyWHfXRuTlVAO_dWEw0GeyBRkDoTNfgzcxaSBh2Q0ZMlmneTqdfeA/s1600/image006-776736.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5840075668516042642" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNypOqVhdinQ5aLyk-VMpKRK7aXb4sZNhhQ30vf72vDpe5r7YKEIjke7ZCZsXPEYhIntCQYH5SlD23-WJBBBmCZZhyWHfXRuTlVAO_dWEw0GeyBRkDoTNfgzcxaSBh2Q0ZMlmneTqdfeA/s320/image006-776736.png" /></a><span style="font-family: "Garamond","serif"; font-size: 16.0pt;"> </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Garamond","serif"; font-size: 16.0pt;">After this variable screening, we would think recentview_count ps_lowest_price days_after_ps_last_view keyword_count impressions rank pool_size lowest_price ref_rank days_after_ref days_after_last_view num_sellers can pass to next step.</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Garamond","serif"; font-size: 16.0pt;">Next we will check the linear relation between those IVs with DVs one by one. (Question: <span style="background: silver;">how to check linear relation between IV and a binary DV</span>?)</span></div>
<div class="MsoNormal">
<br /></div>
</div>
</div>
</div>
</div>
sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-5955423434831718272013-01-31T17:22:00.001-08:002013-01-31T17:22:50.696-08:00inputn and putn to convert between date and number%let num2date=%sysfunc(putn(10238, date9.));
<br>
<br>%put &num2date;
<br>
<br>
<br>%let date=%sysfunc(inputn(31jan2013, date9.));
<br>
<br>%put &date;sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-10339742267143372302013-01-11T19:19:00.003-08:002013-01-11T19:19:31.799-08:00SAS: use CALL EXECUTE or MACRO Variables to score data<br />
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="color: green; font-family: Courier New;">For some SAS procedures, like proc fmm, it will not score the data so need to do it manually with score formula.</span></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="color: green; font-family: Courier New;"><br /></span></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="color: green; font-family: Courier New;">Here we show two methods to do this. For the first, we can use CALL EXECUTE to generate score code; For the second, we can use macro variables for each coefficient.</span></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="color: green; font-family: Courier New;"><br /></span></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: green; font-family: "Courier New";"><br /></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: green; font-family: "Courier New";">*** coefficients for the
variables ***;</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">data</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> coef;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: blue; font-family: "Courier New";">input</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> coef1 coef2 coef3;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: blue; font-family: "Courier New";">cards</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: #ffffc0; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">.1 .26 .58<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: green; font-family: "Courier New";">*** data to be scored ***;</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">data</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> test;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: blue; font-family: "Courier New";">array</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> a_iv{*} x1 x2 x3;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: blue; font-family: "Courier New";">do</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> j=</span><b><span style="background: white; color: teal; font-family: "Courier New";">1</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">to</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><b><span style="background: white; color: teal; font-family: "Courier New";">100</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">do</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> i=</span><b><span style="background: white; color: teal; font-family: "Courier New";">1</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">to</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><b><span style="background: white; color: teal; font-family: "Courier New";">3</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> a_iv[i]=rand(</span><span style="background: white; color: purple; font-family: "Courier New";">'normal'</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">,i);<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">end</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">output</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: blue; font-family: "Courier New";">end</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: green; font-family: "Courier New";">*** First: use call execute to
generate the score code ***;</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">data</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">_null_</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">set</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> coef </span><span style="background: white; color: blue; font-family: "Courier New";">end</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">=last;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">if</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> _n_=</span><b><span style="background: white; color: teal; font-family: "Courier New";">1</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">then</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: silver; color: blue; font-family: "Courier New"; mso-highlight: silver; mso-shading: white;">call</span><span style="background-color: silver; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> execute</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">(</span><span style="background: white; color: purple; font-family: "Courier New";">' data score(drop=i j); set test; key=_n_;
score= 1 + x1 * '</span><span style="background-color: silver; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">|| coef1 ||</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: purple; font-family: "Courier New";">'+x2 * '</span><span style="background-color: silver; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">|| coef2 ||</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: purple; font-family: "Courier New";">'+x3 * '</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> <span style="background: silver; mso-highlight: silver;">|| coef3 ||</span> </span><span style="background: white; color: purple; font-family: "Courier New";">';'</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> );<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><br /></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">SAS will generate codes like this in the log file, which is our score code;</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Fa1zgw1WftFF1ufQUokqEkoSx0VBIf75-Bx1aIe8bTa3z5VkaHNEnmpxxmtLBnZPV6xE91Go5DcH34kO-PhGPxunH76bJ8KGdOiTysLAzUh_oaWBuiCyKQ1o_4ssxZCARr_P92iZvEE/s1600/01+_call_execute.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Fa1zgw1WftFF1ufQUokqEkoSx0VBIf75-Bx1aIe8bTa3z5VkaHNEnmpxxmtLBnZPV6xE91Go5DcH34kO-PhGPxunH76bJ8KGdOiTysLAzUh_oaWBuiCyKQ1o_4ssxZCARr_P92iZvEE/s1600/01+_call_execute.JPG" /></a></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">proc</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><b><span style="background: white; color: navy; font-family: "Courier New";">print</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">data</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">=score;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: green; font-family: "Courier New";">*** Second: if coef is in one
column like below, we can store each into a macro variable, and then score use
sas macro ***;</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">proc</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><b><span style="background: white; color: navy; font-family: "Courier New";">transpose</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">data</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">=coef </span><span style="background: white; color: blue; font-family: "Courier New";">out</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">=coef1;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: green; font-family: "Courier New";">***
generate macro variables for each coefficient in PROC SQL, by using
THROUGH ***;</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">proc</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><b><span style="background: white; color: navy; font-family: "Courier New";">sql</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">select</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> col1 </span><span style="background: white; color: blue; font-family: "Courier New";">into</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">: m_coef1 <span style="background: silver; mso-highlight: silver;">through</span> : m_coef3 </span><span style="background: white; color: blue; font-family: "Courier New";">from</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> coef1;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">quit</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: blue; font-family: "Courier New";">%put</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> &m_coef1 &m_coef2
&m_coef3;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">data</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> score2(</span><span style="background: white; color: blue; font-family: "Courier New";">keep</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">=key score2);<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">set</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> test;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> key=_n_;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> score2=</span><b><span style="background: white; color: teal; font-family: "Courier New";">1</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">+x1*&m_coef1
+ x2*&m_coef2 + x3*&m_coef3;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: green; font-family: "Courier New";">*** if there are
many variables, we can use loop to concatenate the formular ***;</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background: white; color: green; font-family: "Courier New";">***
score and score2 should be the same
***;</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">data</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> merged;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">merge</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> score score2;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">
</span><span style="background: white; color: blue; font-family: "Courier New";">by</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> key;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">proc</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><b><span style="background: white; color: navy; font-family: "Courier New";">print</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';"> </span><span style="background: white; color: blue; font-family: "Courier New";">data</span><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">=merged;<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background: white; color: navy; font-family: "Courier New";">run</span></b><span style="background-color: white; background-position: initial initial; background-repeat: initial initial; font-family: 'Courier New';">;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<o:p>The output is like:</o:p></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid8OFA3PiYB2BCpi8TGeipNm3qUQgmFdu7ANVpi60Q8rfMMUJbRKC9-9rAboGtDly7Ce5pxqursaEAoE3ac4HXRoKvG4nVbLBkJZdEjg3oV_dr4zaTxDaZwbf8NoTtEeJe3pFW10qu6MA/s1600/final.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid8OFA3PiYB2BCpi8TGeipNm3qUQgmFdu7ANVpi60Q8rfMMUJbRKC9-9rAboGtDly7Ce5pxqursaEAoE3ac4HXRoKvG4nVbLBkJZdEjg3oV_dr4zaTxDaZwbf8NoTtEeJe3pFW10qu6MA/s1600/final.JPG" /></a></div>
<div class="MsoNormal">
<br /></div>
sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-31391626417475487162012-12-13T18:43:00.000-08:002012-12-13T18:43:07.110-08:00Cluster levels of Categorical variable to avoid over-fitting<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<span style="font-family: Courier New, Courier, monospace;">Consider this context: target variable target_revenue is a continuous variable. The predictors include continuous variables like hist_visits, as well as <span style="color: red;">categorical variable</span> like best_leafnode_id, which <span style="color: red;">has hundreds of levels.</span></span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">If use this best_leafnode_id directly, we may fit the data well because of the many levels of predictor best_leafnode_id(since there are lots of levels, and there is an estimation for each level, so it's like we have more parameters, and we have more degree of freedom, and thus the fitting will be better). As is shown in the second graph below.</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">Because we input too many parameters, one potential problem is over-fitting. That is, we fit the training data well but it will not predict well on the validation dataset. </span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">From the second graph, if look at the
<span style="color: red;">training </span>data, the reg7(reg with all levels of original data) has less
MSE=<span style="color: red;">83.95 </span>which is less than reg6(regression with clustered levels) MSE=<span style="color: red;">86.07</span>.
However, if we look at the <span style="color: red;">validation </span>data, we can see reg6(MSE=<span style="color: red;">82.129</span>) is less
than reg7(MSE=<span style="color: red;">105.655</span>). That means if using the original data without
clustering their levels, it will cause overfitting. <span style="color: blue;">It
shows the clustered level method can avoid over-fitting. But we should choose a
proper number of new levels. Not too small, not too large.</span></span><br />
<div class="MsoNormal">
<span style="font-family: Courier New, Courier, monospace;"><o:p></o:p></span></div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">Below is the SAS code to cluster the huge level categorical variable: first calculate the mean of target_revenue at each level of best_leafnode_id, then bucket the levels of best_leafnode_id into less levels by the value of mean in the first step. Then we can format the old levels into new levels, the number of new levels is assigned by us. </span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<br />
<span style="color: blue;">Pic01 -- SAS code to cluster levels</span><br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhol-020x8LbgFvImdqAvHL9NUh-7zTb-_1vbitrtowqICatSbDfs-PySk6ZCC3NjzBjtY5Ib8xN2LAlOFZogBPESzd8jnAn83DrDXMmS4DBQyHrBpce34vfedlEUyQ_uc7sZaDlVFBS5o/s1600/001.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhol-020x8LbgFvImdqAvHL9NUh-7zTb-_1vbitrtowqICatSbDfs-PySk6ZCC3NjzBjtY5Ib8xN2LAlOFZogBPESzd8jnAn83DrDXMmS4DBQyHrBpce34vfedlEUyQ_uc7sZaDlVFBS5o/s1600/001.png" /></a><br />
<br />
<br />
<br />
<span style="color: blue;">Pic02 -- Comparing the two models: </span><br />
<span style="color: blue;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXHPgbENxzuY9Bo9wvEGjrICxMlXiKnwCHXHYJw_vDIP61arhnv56dvt3LO1TzCOaqFULeis9HqJrjQJET-LkpNTp2E-fwxZUrG6UJmUwMGjr219N8Ir_j0F78Y7tWGdR3xxKL0YPVp40/s1600/002.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXHPgbENxzuY9Bo9wvEGjrICxMlXiKnwCHXHYJw_vDIP61arhnv56dvt3LO1TzCOaqFULeis9HqJrjQJET-LkpNTp2E-fwxZUrG6UJmUwMGjr219N8Ir_j0F78Y7tWGdR3xxKL0YPVp40/s1600/002.png" /></a></div>
<br />sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-87589456506998271202012-12-09T22:29:00.000-08:002012-12-09T22:29:02.975-08:00R: Replicate plot with ggplot2 (part 4) bar chartThe original plot from UCLA ATS is <a href="http://www.ats.ucla.edu/stat/r/gbe/barchart.htm">here</a>. Here the same plot is done in package ggplot2.<br />
<br />
Let's read in the data:<br />
<script src="https://gist.github.com/4248790.js?file=gistfile1.r"></script>
The bar chart here is to show how many observations for each level of ses(=1, 2 and 3 separately, will be replaced by "low", "median" and "high" separately).<br />
<br />
The first plot is the plain bar chart:<br />
<br />
<script src="https://gist.github.com/4248816.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuK8ouG7sg6nZLBIj9auFszQyecwPXravIy2BSMvvXx0h-fFSCFGpmjNjUvTTlZbi2Ed4QkvBbj2rMih1if5l2TRSwK0JGJyJQmsbj7T6En7XXwLiHJPOesUEXADFsPQoPUGy6pK5FvdM/s1600/01.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuK8ouG7sg6nZLBIj9auFszQyecwPXravIy2BSMvvXx0h-fFSCFGpmjNjUvTTlZbi2Ed4QkvBbj2rMih1if5l2TRSwK0JGJyJQmsbj7T6En7XXwLiHJPOesUEXADFsPQoPUGy6pK5FvdM/s640/01.JPG" width="585" /></a><br />
<span style="text-align: center;">The second one is to replace the original value of ses by "low", "median" and "high" separately:</span><br />
<br />
<script src="https://gist.github.com/4248795.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj64rshToWJ5W5tGKD4ZB07y0TZxdn0tqRti2Hx3Zq7dXoqONKJkCQtYCgTD-Dx4qo6bLrSbJH6uZprgWdUEy56SLWsjZayJEied7UZKiVXCed4ukmMyoL8X6qb-HaQ1j3izxYN1DZdD5Q/s1600/02.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj64rshToWJ5W5tGKD4ZB07y0TZxdn0tqRti2Hx3Zq7dXoqONKJkCQtYCgTD-Dx4qo6bLrSbJH6uZprgWdUEy56SLWsjZayJEied7UZKiVXCed4ukmMyoL8X6qb-HaQ1j3izxYN1DZdD5Q/s640/02.JPG" width="586" /></a><br />
The third one is to change the width of the bin.<br />
<br />
<script src="https://gist.github.com/4248797.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9mjmTYOglsQBtC5-tV67T6hnYiIWQrPyNfi-ZobXfUB1d4oty_Mwf0AWdrjKzGa_Zk6caXrpyIYADQZsCFJU8uJl7Ta0QmdG-uzo2p7Cu36KQsnQyJtdDdB0LuH5q26ZepmDQuDY3KSo/s1600/03.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9mjmTYOglsQBtC5-tV67T6hnYiIWQrPyNfi-ZobXfUB1d4oty_Mwf0AWdrjKzGa_Zk6caXrpyIYADQZsCFJU8uJl7Ta0QmdG-uzo2p7Cu36KQsnQyJtdDdB0LuH5q26ZepmDQuDY3KSo/s640/03.JPG" width="582" /></a><br />
The forth is to group the data by female first, and then bin on each level of female(0 or 1). This is stacked plot. From the plot it is not easy to figure out which is higher. So we need to plot it separately as is shown in the fifth graph.<br />
<br />
<script src="https://gist.github.com/4248803.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6CWb1EN-vC8JtQcQWyDwf-A4q3CdXJ_pRXss8khD8rLt8Xcjo9Yn49I2zSfqjOy5Vgn5WFvE9Vrwli4FpoN3rDbvgK5BE6QAs9Hnj80Q88b7N8DFUSgF9RL_xtZy52k7FGThl-_iTQrA/s1600/04.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6CWb1EN-vC8JtQcQWyDwf-A4q3CdXJ_pRXss8khD8rLt8Xcjo9Yn49I2zSfqjOy5Vgn5WFvE9Vrwli4FpoN3rDbvgK5BE6QAs9Hnj80Q88b7N8DFUSgF9RL_xtZy52k7FGThl-_iTQrA/s640/04.JPG" width="560" /></a><br />
The fifth one is plot bar chart at each level of female. That is, plot two bar chart for ses at 0 and 1 level of females separately.<br />
<br />
<script src="https://gist.github.com/4248805.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIN5ivLpuz2oiqqFL5YdlWUE3k-7bhcUE4o3lAtsBC0c9HhsHRoOWIF_RDn8ySoETNdI8GBiA68iOjlbGUwUBEH0j0imhfrFNGKoQtHDQvS6y5hwuA0-oT92LE1WYB6rYbGNApbUJBrZI/s1600/05.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIN5ivLpuz2oiqqFL5YdlWUE3k-7bhcUE4o3lAtsBC0c9HhsHRoOWIF_RDn8ySoETNdI8GBiA68iOjlbGUwUBEH0j0imhfrFNGKoQtHDQvS6y5hwuA0-oT92LE1WYB6rYbGNApbUJBrZI/s640/05.JPG" width="578" /></a><br />
<br />sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-68738111093285820442012-12-09T18:59:00.002-08:002012-12-09T19:00:58.594-08:00R: Replicate plot with ggplot2 (part 3) boxplotThis is to replicate the boxplot from UCLA ATS: <a href="http://www.ats.ucla.edu/stat/r/gbe/boxplot.htm">Examples of box plots</a><br />
<br />
First read in the data:<br />
<br />
<script src="https://gist.github.com/4248073.js?file=gistfile1.r"></script>
The <span style="color: red;">first </span>is to draw the boxplot with no features<br />
<br />
<script src="https://gist.github.com/4248077.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1oH8RM56ciCfS1QNunIUkY5yAm5JSHi7ZK3QBk3OY19YUfY7JpG7bpPnVYcR-YamN654brJ_X-0X7HO8IW-5Fcwrmeaez3t2JtmQnqWV6fKJmWGdGXHQ1rs8jBBHI_8NM-E-cZU1WmoM/s1600/01.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1oH8RM56ciCfS1QNunIUkY5yAm5JSHi7ZK3QBk3OY19YUfY7JpG7bpPnVYcR-YamN654brJ_X-0X7HO8IW-5Fcwrmeaez3t2JtmQnqWV6fKJmWGdGXHQ1rs8jBBHI_8NM-E-cZU1WmoM/s640/01.JPG" width="578" /></a><br />
The <span style="color: red;">second </span>is filling the box area with blue color.<br />
<br />
<script src="https://gist.github.com/4248078.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgE-ntVAZDSnTQMlQGW0M-r8xCuU16TRnkTamhM-ufiHUcz9m5Tj-ah9kDkgo3VqMofQwicadxHL9x9oHaBc3PDHriupZ-HGNmU4Zw4-koHsuQ2o_ByZK2u3N6gWwis27Nq6U8Fd4lVDxo/s1600/02.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgE-ntVAZDSnTQMlQGW0M-r8xCuU16TRnkTamhM-ufiHUcz9m5Tj-ah9kDkgo3VqMofQwicadxHL9x9oHaBc3PDHriupZ-HGNmU4Zw4-koHsuQ2o_ByZK2u3N6gWwis27Nq6U8Fd4lVDxo/s640/02.JPG" width="582" /></a><br />
The <span style="color: red;">third </span>is the box plot splited by a categorical variable ses:<br />
<br />
<script src="https://gist.github.com/4248081.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9bpLMOUuORQMkEqEjqChh1IsnhnsBx9RBY1QT85ebnt37iXQfAxYT7ylb5p3NHv9uNerGNTmeGXoDyvBv0YR0VCVSzXjs5VCKsaTRShzB6Top57I4DkbeJ9dylRQlfDpyUsxz1XwQjlI/s1600/03.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9bpLMOUuORQMkEqEjqChh1IsnhnsBx9RBY1QT85ebnt37iXQfAxYT7ylb5p3NHv9uNerGNTmeGXoDyvBv0YR0VCVSzXjs5VCKsaTRShzB6Top57I4DkbeJ9dylRQlfDpyUsxz1XwQjlI/s640/03.JPG" width="586" /></a><br />
<span style="color: red;">Next </span>is to rename the levels of the categorical variable:<br />
<br />
<script src="https://gist.github.com/4248084.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihSEgNoKKRzFK_WtNSHVdCtxk7M7_DcLxX5x1IWpWadatcTza74hitjo_FAfIgX9mK_cmeLL07WFjSQmi34R3s0_juEP8X8GG1GHeVVaaw1SQJwTSeWLLE8dw4VnGaUFGVgS5SlC4yC9w/s1600/04.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihSEgNoKKRzFK_WtNSHVdCtxk7M7_DcLxX5x1IWpWadatcTza74hitjo_FAfIgX9mK_cmeLL07WFjSQmi34R3s0_juEP8X8GG1GHeVVaaw1SQJwTSeWLLE8dw4VnGaUFGVgS5SlC4yC9w/s640/04.JPG" width="580" /></a><br />
The <span style="color: red;">fifth </span>one is to set notch = TRUE<br />
<br />
<script src="https://gist.github.com/4248086.js?file=gistfile1.txt"></script>
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWrrewKHI1ilrYdqpYwgGCHxn_smW46TQxc88yHVS8A4rixPtpxusKXzPDuU-h6uO7mHuGGcYYUVvwCcyw42aZhe3eMh5czgZNO3UV_u7Jz3Ss9Rcq2MLYMZw6urLQ8JrAw3uRYcIVYGA/s1600/05.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWrrewKHI1ilrYdqpYwgGCHxn_smW46TQxc88yHVS8A4rixPtpxusKXzPDuU-h6uO7mHuGGcYYUVvwCcyw42aZhe3eMh5czgZNO3UV_u7Jz3Ss9Rcq2MLYMZw6urLQ8JrAw3uRYcIVYGA/s640/05.JPG" width="588" /></a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
The <span style="color: red;">sixth </span>is the plot on the <span style="color: blue;">interaction </span>levels of more than one categorical variable:<br />
<br />
<script src="https://gist.github.com/4248088.js?file=gistfile1.txt"></script>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiicSjShenawwJ2ROWvwGncZhCKnQpQ1kbJ2HBjZLb-aY-AKZ1DyJVE08R5m-k6JLVcbj9M8aOXEks-8KNnr6UT-WmUikqeDNNKDLCqI51ckZukVYbMovRPKQUh-mNnXZl4S9sKbmaDIYc/s1600/06.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiicSjShenawwJ2ROWvwGncZhCKnQpQ1kbJ2HBjZLb-aY-AKZ1DyJVE08R5m-k6JLVcbj9M8aOXEks-8KNnr6UT-WmUikqeDNNKDLCqI51ckZukVYbMovRPKQUh-mNnXZl4S9sKbmaDIYc/s640/06.JPG" width="584" /></a><br />
<br />
<br />
A little more, if we want to add outliers on the plot, it should be like:<br />
<br />
<script src="https://gist.github.com/4248094.js?file=gistfile1.r"></script>
<br />
<br />
<br />sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0tag:blogger.com,1999:blog-3207512912949101943.post-22260295533597450382012-12-08T14:12:00.004-08:002012-12-09T19:02:22.870-08:00R: Replicate plot with ggplot2 (part 2) histogram, emprtical curve, normal density curveThe first one is to replicate the scatter plot <a href="http://www.songhuiming.com/2012/11/r-replicate-plot-with-ggplot2.html">here</a>. This part is to replicat the plot of histogram.<br />
<br />
The original UCLA ATS like is: http://www.ats.ucla.edu/stat/r/gbe/histogram.htm.<br />
<br />
In ggplot2, <span style="color: red;">geom_histogram </span>is used to draw histogram.<br />
<br />
The data is:<br />
<script src="https://gist.github.com/4242101.js?file=gistfile1.txt"></script>
<br />
The <span style="color: red;">first </span>one is histogram with black fill:<br />
<br />
<script src="https://gist.github.com/4242165.js?file=gistfile1.r"></script>
The output graph is:<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb0lCRUFSYJfxudU7rJmICDbfrCpEyJV6GhIy4Ksyyy_Cy8rIXyS-ezZ-XuIrq_OJ2s3Lk_VYhahqXPu59i5bqvbRHhALDZfELTac7pRi0XX5gIX7upPa4wTErqQU3XaBYzUNLDcpIe4Y/s1600/01.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="466" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb0lCRUFSYJfxudU7rJmICDbfrCpEyJV6GhIy4Ksyyy_Cy8rIXyS-ezZ-XuIrq_OJ2s3Lk_VYhahqXPu59i5bqvbRHhALDZfELTac7pRi0XX5gIX7upPa4wTErqQU3XaBYzUNLDcpIe4Y/s640/01.JPG" width="640" /></a><br />
ggplot2 has more choice that you can <span style="color: blue;">fill in the color by the counts</span> in each bin. like<br />
<br />
<script src="https://gist.github.com/4242171.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDOYKYKvrPggtQsDs0iw5MUixMsIkxVnT3EJU-bpUyAE6VHYQGKBGumllk2ZrOoCq9Lf5zON_Ts-oNnhqwsjFRCApOPRWYsKhej1JpziZrJquIWwW1cQ98mdrSA3eBEfMg5bHGMlG6qKE/s1600/02.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDOYKYKvrPggtQsDs0iw5MUixMsIkxVnT3EJU-bpUyAE6VHYQGKBGumllk2ZrOoCq9Lf5zON_Ts-oNnhqwsjFRCApOPRWYsKhej1JpziZrJquIWwW1cQ98mdrSA3eBEfMg5bHGMlG6qKE/s640/02.JPG" width="640" /></a><br />
<span style="color: red;">Next </span>is to change the binwidth. In ggplot2 this can be done by binwidth or by breaks. Here is shown by binwidth:<br />
<br />
<script src="https://gist.github.com/4242174.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQWjT65YWYFrPRBvoj6ywMQNvsHwdwNrueR18HoKo9KgcJfTPJb2D4SICYxgfZu24PaL_3a-xMDE_xR_H5-q7_BR7ZxITmRAfOc3PZSLL-wS6pxiZRQcpDfPPaQRYhNSEUtnTxdD20Dmg/s1600/03.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="472" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQWjT65YWYFrPRBvoj6ywMQNvsHwdwNrueR18HoKo9KgcJfTPJb2D4SICYxgfZu24PaL_3a-xMDE_xR_H5-q7_BR7ZxITmRAfOc3PZSLL-wS6pxiZRQcpDfPPaQRYhNSEUtnTxdD20Dmg/s640/03.JPG" width="640" /></a><br />
<span style="color: red;">Next </span>is to change from frequency to density(percentage in fact). This can be done by assigning y to be ..density.. in ggplot2:<br />
<br />
<script src="https://gist.github.com/4242176.js?file=gistfile1.r"></script>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjAPEcSTD9DHHYPwLBuL52BMVbRe_kWERpSBbRMGZ-b52x3Qmb1xQ5urDOOijX0TMeh5x971dicKbV-eNUHZW6135P6g38n12YRocwPaS2xCF_kFoy5q6PBu0hWhbEZnkt5iNmssDDkQ4g/s1600/04.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="470" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjAPEcSTD9DHHYPwLBuL52BMVbRe_kWERpSBbRMGZ-b52x3Qmb1xQ5urDOOijX0TMeh5x971dicKbV-eNUHZW6135P6g38n12YRocwPaS2xCF_kFoy5q6PBu0hWhbEZnkt5iNmssDDkQ4g/s640/04.JPG" width="640" /></a><br />
Then is to <span style="color: blue;">add normal density</span> and empirical curve in the plot.<br />
<br />
First is the empirical curve:<br />
<br />
<script src="https://gist.github.com/4242179.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1o9KEA-dw5cUCvnJqDls4XQhovkOPTNj_7Oeeu1cNq_c3YqUXXrVhyz3VVPRsE1wz_PdxwuEoTsDWWeKxxKg33RWLSFd-hx0pnOeAA2Jv0CZDuBBIhtLDfkmWVCu6QG_dQzi6wWhiwWU/s1600/05.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="468" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1o9KEA-dw5cUCvnJqDls4XQhovkOPTNj_7Oeeu1cNq_c3YqUXXrVhyz3VVPRsE1wz_PdxwuEoTsDWWeKxxKg33RWLSFd-hx0pnOeAA2Jv0CZDuBBIhtLDfkmWVCu6QG_dQzi6wWhiwWU/s640/05.JPG" width="640" /></a><br />
Next is the Normal density with mean and std are from the generated data:<br />
<br />
<script src="https://gist.github.com/4242181.js?file=gistfile1.r"></script>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghe1QyTJ5dIsKiTfQrcU0DoIVWbqkLoldhwyJiXPEaHhTWXdQaIZbXffjlBzzPSpuzc-dB7hSyFS8Fdoc-LIssmca7capzklxYvRwszaCAzU-py-WEYT-8TYatout8uIb8_GsZwULVO50/s1600/06.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="464" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghe1QyTJ5dIsKiTfQrcU0DoIVWbqkLoldhwyJiXPEaHhTWXdQaIZbXffjlBzzPSpuzc-dB7hSyFS8Fdoc-LIssmca7capzklxYvRwszaCAzU-py-WEYT-8TYatout8uIb8_GsZwULVO50/s640/06.JPG" width="640" /></a><br />
The last one is to add the counts on top of each bin. I did not figure out how to do it. It should be easy by adding text. But unitl now my process has not reached there.<br />
<br />sas studyhttp://www.blogger.com/profile/13494555392947175879noreply@blogger.com0