Thursday, March 10, 2011

DLM and DSD options in infile statement and & usage

The DLM= option enables you to list read data with delimiters rather than the default space. i.e., if your data are separated by ";", you can use DLM=';' to read them correctly.

The DSD (Delimiter-Sensitive Data) in infile statement does three things for you. 1: it ignores delimiters in data values enclosed in quotation marks; 2: it ignores quotation marks as part of your data; 3: it treats two consecutive delimiters in a row as missing value.

eg1: we want to read this data set named testdata.txt in c:\sasdata.

10034 "apples, grapes kiwi" 123456
92626 "oranges" 97654
25414 "pears apple" 987654

data test;
      infile 'c:\sasdata\testdata.txt' dlm=' ' dsd;
      length name $22;
      input zip name $ pounds;
run;


*********************************************************************

& usage in SAS

& stops reading character variables until it encounters two or more spaces. So a single space to differentiate the character variable and the adjacent variable will be ignored and the two variables will be treated as one variable.

eg2:
data web;
input age site & $41. hits;
datalines;
12  http://www.site1.org/default.htm   123456
30  http://www.site2.com/index.htm   97654
54  http://www.site3.edu/department/index.htm    987654
;

data two;
input name & 20. age;
cards;
james band   25
barak obama   26
jintao hu   25
;

6 comments:

  1. Great. Explained in a simple way.Thanks.

    ReplyDelete
  2. hi,

    i am new to sas programming, i was trying to use multiple delimiter options, for the following cards
    001*Alfred, M,24,10000,"a-nagar,Hyderabad"
    002*Alice, F,23,10000,"b-nagar,Hyderabad"
    003*Barbara,F,23,20000,"a-nagar,Hyderabad"

    code:
    data SPECIAL;
    infile datalines dlm='* ,';
    input ID NAME$ SEX$ AGE SALARY ADDRESS:$20.;
    datalines;
    001*Alfred, M,24,10000,"a-nagar,Hyderabad"
    002*Alice, F,23,10000,"b-nagar,Hyderabad"
    003*Barbara,F,23,20000,"a-nagar,Hyderabad"
    ;
    run;

    output:
    Obs ID NAME SEX AGE SALARY ADDRESS
    1 1 Alfred M 24 10000 "a-nagar
    2 2 Alice F 23 10000 "b-nagar
    3 3 Barbara F 23 20000 "a-nagar

    the output is not complete or as i expected and the last column is truncated.

    can you help to suggest the correct usage of delimiters to have the complete data in output.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. REMOVE THE EXTRA SPACE BEFORE M & F IN FIRST 2 OBS UNDER GENDER VARIABLE AS IT CONSIDERS 2 CONSECUTIVE DELIMITERS (COMMA & SPACE) AS MISSING

    ReplyDelete
  5. data infile_c;
    infile '/home/u59973295/sasuser.v94/infile/c.txt' dlm=',-*';
    input id name$ sex$ age sal area$ 25-41;
    run;
    data infile_c1;
    set infile_c;
    area=tranwrd (area,'-',' ');
    area=tranwrd (area,'*',' ');
    run;

    ReplyDelete