SAS Tutorial 2

The purpose of this document is to present the following concepts:

Commonly Used Modifiers
Forms of INPUT statements
Reading Multiple Records Per Observation

Part 1: Commonly Used Modifiers

By default SAS assumes all data values are numeric. Therefore if some of the variables in the data set has non-numeric values the user will need to use one or all of the following three modifiers in some combination so as to match the form of the underlying data stream. (See section on Formatted Input for examples).

$ indicates that a variable has character values with default size of eight (8) characters with no embedded blanks.

& indicates that a character value may have one or more single embedded blanks.
The first occurrence of at least two consecutive blanks indicates an end for the variable value.
: indicates that the data value is to be read from the next non blank column until the pointer reaches the next blank column or the end of the data line. That is, allows the user to read more than eight (8) characters with no embedded blanks.

Part 2: Forms Of INPUT Statement

List Input
Use the List input mode to read data recorded with at least one blank space separating each data field. Missing values are represented as a dot (period).

Form: INPUT variable list < modifiers >;

Example:

DATA Census;
INPUT State $ Pop;
CARDS;
NC 5.082
SC 2.590
VA .

Column Input
Use Column input mode to read the following type of data.
Standard character and numeric data
Data values which are entered in fixed column positions
Character values longer than eight characters
Character values that contain embedded blanks
Form: INPUT variable < modifier > startcol – endcol;
Example:
DATA Census;
INPUT State $ 1-2 Pop 3-7;
CARDS;
NC5.082
SC .590
VA . Formatted Input
Use formatted input mode to read the following:
Data in fixed column positions (column input is also a viable choice)
Nonstandard numeric and character data
Data whose location is determined by other data values
Form: INPUT pointercontrol variable < modifiers > informat;
Pointer Controls:
@n go to column n
+n move the pointer n positions
#n advance to the first column of the nth record
@ hold the current input line and re-read certain variables.
@@ useful when each input line contains values for several observations Informats:
w. numeric width (will also advance the pointer)
w.d numeric width with an implied decimal
$w. character width Example:
DATA Census;
INPUT State &14. Product :8 Pop @@;
CARDS;
North Carolina Pins 5.082 South Carolina Needles 0.590
Virginia Cushions . Note: For further details on informats refer to, SAS Language Reference Version 6.0.

Named Input
Use Named input to read data lines containing variable names followed by an equal sign and a value for the variable.
Form:
INPUT variable= informat.; -or-
INPUT variable=modifier; -or-
INPUT variable=startcol-endcol; Example:
DATA Census;
INPUT State =:$14. Pop= ;
CARDS;
STATE=North Carolina Pop=5.082
STATE=South Carolina Pop=0.590
STATE=Viginia Pop=.
General Notes on INPUT
All forms of input, except the named input, can be used in any combination.
Once you start reading data using the named input all other variables must be read using the named input.

Part 3: Reading Multiple Records Per Observation

There are three techniques for reading multiple records of data to create a SAS data set.
Prepare one INPUT statement for every record. That is, if there are three records per observation then you would have three INPUT statements.
Use a slash / in a single INPUT statement to indicate that the next record is to be placed into the input buffer.
Use #n to advance to the first column of the nth record.

Example: Assume that the data consists of two records per observation. The variables on the second record are Y30-Y50. The following examples are equivalent.
1. INPUT State $ 1-2 Area 4-8 (Y10-Y20) (5.);
INPUT (Y30-Y50) (5.);
2. INPUT State $ 1-2 Area 4-8 (Y10-Y20) (5.) / (Y30-Y50) (5.);
3. INPUT #1 State $ 1-2 Area 4-8 (Y10-Y20) (5.) #2 (Y30-Y50) (5.);