Wednesday, December 1, 2010

SAS Base tips

1: SAS system options displays the time on a report:  DATE

2: SAS options terminate html output:  CLOSE

3: SAS system options prevents the page number from appearing on report: NONUMBER

4: drop/keep:   (drop=variables)   or   drop variables  // 4-1

5: label song='song huiming';   1-5

6: format and informat. format shows how data are written. e.g.: $8.2 will let 1234 shows as $1234.00 and informat tells SAS how to read data as SAS data values. e.g.: comma10. will read data $12,345.00 as 12345 in SAS data value.  (format 1-5 pg 47)

7: libname song SPSS 'c:\'; create a sas library to store SAS files and only store SAS files and use SPSS engine. LIBNAME and FILENAME statements are global.  Librefs and filerefs remain in effect until you change them, cancel them, or end your SAS session.    1-6

8: proc contents data=mylib._all_ nods; NODS suppresses the printing of detailed information about each file when you specify _ALL_. By default, PROC CONTENTS and PROC DATASETS list variables alphabetically. To list variable names in the order of their logical position (or creation order) in the data set, you can specify the VARNUM option

9: options nonumber nodate suppress the page number and current date.

10: ERRORS=n     FMTERR | NOFMTERR      SOURCE | NOSOURCE   1-4

11: PROC SORT treats missing values as the smallest possible values.   1-5

12: ID statement specifies the same variable as the BY statement:    1-5
    *  the Obs column is suppressed
    * the ID/BY variable is printed in the left-most column
    * each ID/BY value is printed only at the start of each BY group and on the line that contains that group's subtotal.

13: DOUBLE option in the PROC PRINT to double the layout space

14: Redefining a title or footnote line cancels any higher-numbered title or footnote lines, respectively.   1-5

15: 38245.3975  DOLLAR9.2  $38245.40           38245.3975  DOLLAR8.2  38245.40

16: use the OBS= option in the INFILE statement.     infile 'c:\s1.dat' obs=10;

17: DATA step does not fail as a result of the invalid data but continues to execute. Unlike syntax errors, invalid data errors do not cause SAS to stop processing a program.

18: If your data contains semicolons, use the DATALINES4 statement plus a null statement that consists of four semicolons (;;;;).  1-6

19: If you do not execute a FILE statement before a PUT statement in the current iteration of the DATA step, SAS writes the lines to the SAS log. If you specify the PRINT option in the FILE statement, before the PUT statement, SAS writes the lines to the procedure output file.      1-6

20: Because you are creating raw data ( data _NULL_ ), you don't need to follow character variable names with a dollar sign ($).     1-6

21: At the beginning of the execution phase, the value of _N_ is 1. Because there are no data errors, the value of _ERROR_ is 0. The remaining variables are initialized to missing. Missing numeric values are represented by periods, and missing character values are represented by blanks.

22: When reading variables from raw data, SAS sets the value of each variable in the DATA step to missing at the beginning of each cycle of execution, with these exceptions:
    * variables that are named in a RETAIN statement
    * variables that are created in a sum statement
    * data elements in a _TEMPORARY_ array
    * any variables that are created with options in the FILE or INFILE statements
    * automatic variables.               1-7-pg18
    In contrast, when reading variables from a SAS data set(such as SET MERGE or ARRAY), SAS sets the values to missing only before the first cycle of execution of the DATA step. Thereafter, the variables retain their values until new values become available

23: The format-name
    * must begin with a dollar sign ($) if the format applies to character data
    * cannot end with a number
    * cannot be longer than eight characters
    * cannot be the name of an existing SAS format
    * does not end in a period when specified in a VALUE statement.

24: proc format: can also use the keywords LOW and HIGH to specify the lower and upper limits of a variable's value range. The keyword LOW does not include missing numeric values. The keyword OTHER can be used to label missing values as well as any values that are not specifically addressed in a range.  If applied to a character format, the keyword LOW includes missing character values.    2-1-11

25: When the specified values are character values, they must be enclosed in quotation marks and must match the case of the variable's values. When the specified values are numeric values, they are not enclosed in quotation marks, and the format's name should not begin with a dollar sign ($).   2-1-9

26: You can delete formats using PROC CATALOG or the SAS Explorer window.

27: If you do not format all of a variable's values, then those that are not listed in the VALUE statement are printed as they appear in the SAS data set.

28: Adding the keyword FMTLIB to the PROC FORMAT statement displays a list of all the formats in your catalog

29: PROC MEANS procedure ;  the difference of of class and by:          2-3
   1.  Unlike CLASS, BY processing requires that your data already be sorted or indexed in the order of the BY variables.
   2.  BY statement creates several small tables, but CLASS produce a single large table.

30: PROC MEANS produces a report by default (use NOPRINT option to suppress the default report). PROC SUMMARY produces an output data set, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement.

30: Whether you use PROC SUMMARY or PROC MEANS, the variables listed in the OUTPUT statement must be in the same order as in the VAR statement. 2-3

31: To specify the variables to be processed by the PROC FREQ procedure, include a TABLES statement. In n-way tables, the last two variables of the TABLES statement become the two-way rows and columns. To generate list output for crosstabulations, add / LIST to the TABLES statement in your PROC FREQ step.

32: To suppress output in PROC FREQ, use / NOFREQ NOPERCENT NOROW NOCOL options in TABLES statement.

33: The RETAIN statement
    * is a compile-time only statement that creates variables if they do not already exist
    * initializes the retained variable to missing before the first execution of the DATA step if you do not supply an initial value
    * has no effect on variables that are read with SET, MERGE, or UPDATE statements.

34: In SAS, any numeric value other than 0 or missing is true, and a value of 0 or missing is false.  4-1-9

35: When creating a new character variable in an assignment statement, SAS allocates as many bytes of storage space as there are characters in the first value that it encounters for that variable. ( so need length LastName $ 16 )   4-1-15

36: *  You cannot use the DROP statement in SAS procedure steps.
    * The DROP statement applies to all output data sets that are named in the DATA statement. To exclude variables from some data sets but not from others, use the DROP= data set option in the DATA statement.  (see 4-1-21)

37: If the result of all SELECT-WHEN comparisons is false and no OTHERWISE statement is present, SAS issues an error message and stops executing the DATA step.

38: If more than one WHEN statement has a true when-expression, only the first WHEN statement is used; once a when-expression is true, no other when-expressions are evaluated.

39: If the expression in a sum statement produces a missing value, the sum statement ignores it. (Remember, however, that assignment statements assign a missing value if the expression produces a missing value.)

40: When use POINT to read obs directly, you must use STOP to terminate.    4-2-15

41: Using an OUTPUT statement without a following data set name causes the current observation to be written to all data sets that are named in the DATA statement.

42: In CONCATENATE statement, (SET A B;), take care:   4-3-7
    * Any common variable must have the same type attribute, or SAS stops processing the DATA step and issues an error message stating that the variables are incompatible.
    * However, if the length attribute is different, SAS takes the length from the first data set that contains the variable.
    * The same is true for the label, format, and informat attributes: If any of these attributes are different, SAS takes the attribute from the first data set that contains the variable with that attribute.

43: Automatic character-to-numeric conversion occurs when a character value is
    * assigned to a previously defined numeric variable, such as the numeric variable Rate:   Rate=payrate;
    * used in an arithmetic operation                                                         Salary=payrate*hours;
    * compared to a numeric value, using a comparison operator                                if payrate>=rate;
    * specified in a function that requires numeric arguments.                                NewRate=sum(payrate,raise);

44: the SCAN function assigns a length of 200 to each target variable.

45: If you specify an invalid date in the MDY function, a missing value is assigned to the target variable.

46: The expression is not evaluated until the bottom of the loop, so a DO UNTIL loop always executes at least once. 4-7-24

47: You cannot use array names in LABEL, FORMAT, DROP, KEEP, or LENGTH statements. Arrays exist only for the duration of the DATA step. They do not become part of the output data set.

48: For list input, default length for variables is 8.

49: With formatted input, the informat determines both the length of character variables and the number of columns that are read. input @3 City $12.;
    The informat in modified list input determines only the length of the variable, not the number of columns that are read.  input City & $12.;

50:  Column       standard data values in fixed fields
     Formatted    nonstandard data values in fixed fields
     List          data values that are not arranged in fixed fields, but are separated by blanks or other delimiters

51: Because variable attributes are defined when the variable is first encountered in the DATA step, a variable that is defined in a LENGTH statement (if it precedes an INPUT statement) will appear first in the data set, regardless of the order of the variables in the INPUT statement.

52: The minimum acceptable field width for the TIMEw. informat is 5. If you specify a w value less than 5, you'll receive an error message in the SAS log.

53: the double trailing at sign (@@) holds a record across multiple iterations of the DATA step until the end of the record is reached.
    the single trailing at sign (@) releases a record when control returns to the top of the DATA step.

No comments:

Post a Comment