Monday, September 12, 2011

zz: Structuring SAS Programs


Structuring SAS Programs

The primary aim in writing SAS code is to get a job done. But the job is not just to produce some output. You should structure your SAS programs:
• make finding errors as easy as possible
• make the code easy to understand later by yourself or someone else
• document the time and stage of the analysis
• document data edits and corrections

Most projects have an initial data cleaning phase, followed by an analysis to write a paper. After the paper is submitted, there is a delay until the referee reports arrive, followed by another analysis with revisions. Interruptions of weeks or months in a project are common—write your code to make it easy to pick up later.

Advice:
1. List program name, date, author, and revision date(s), and what the program does at the top. Use comments to break code into sections.
2. Edit the data and create new variables in a single data step at the beginning of the program, as far as possible. This makes it easy to find and to correct problems.
3. Use as few data steps as possible. Data steps are the most confusing part of a program.
4. Use comments to explain data edits and identify data problems. Hard code all data corrections in these programs. If there is email or other documentation, include this as comments in the code, so you don’t need to search your mail files later to figure out what happened.


This advice also applies to a collection of programs written for a project:
1. Number the programs within their names as you write them: PlanB_01.sas, PlanB_02.sas, etc.
2. Do all the data editing and creation of permanent datasets in the first program—no analysis. Hard code all data corrections in these programs. Include email correspondence as comments in the code.
3. Perform analysis in later programs that simply call the permanent datasets. Analysis programs should only create temporary datasets

Multiple versions of the same data invite trouble.

1 comment:

  1. SAS 编程结构

    写SAS程序的主要目的是完成工作,而不是仅仅输出结果。所以要安排好SAS程序:


    要非常容易检视变成当中的错误
    要让程序过一段时间还能被自己或别人理解
    记录下时间和分析步骤
    记录下数据编辑和更改


    好多项目要先进行原始数据整理,然后才是分析得出结论。提交结论报告以后,通常要一段时间才有反馈。然后要进行另一次分析和修改。一个项目通常需要好几个星期或者好几个月才能完成。所以应该把程序写清楚,以后再用到的时候很容易就明白当时做的什么。

    建议:
    1:在程序最上面列出程序名字,时间,作者,每一遍的修改日期,以及本程序的目的。用注释把程序分段。
    2:尽量在程序最开始处用单独的data step来编辑数据,建立新变量。这样的话能够更容易发现和解决问题。
    3:尽量少用data step。大部分时候都是data step让程序变得令人困惑。
    4:每次编辑数据或者发现数据问题的时候都用注释记录下来;记录下每次数据的更改记录;如果发送到email或者其他数据库上,也用注释记录下来。这样以后方便查找。

    下面的建议也适用于某些项目的一大堆程序管理
    1:在程序名字上带上数字,比如 PlanB_01.sas, PlanB_02.sas 等等
    2:在第一个程序里面编辑数据,建立永久数据集,不要在里面分析
    3:在别的地方写程序,然后调用永久数据集;在分析的时候只建立临时数据集;

    同样一个数据的多个版本很容易导致各种问题。

    ReplyDelete