The purpose of this document is to look at some of the commonly used procedures. Prior to a description of the procedures it would be beneficial to study some of the frequently used SAS programming statements.

Part 1: Common Procedural Statements

The statements discussed in this section are:

  • The VAR statement
  • The BY statement
  • The TITLE statement
  • The LABEL statement
  • The OUTPUT statement

A brief explanation of the statements follows.

The VAR Statement

This is an optional statement listing the variables to be used in the statistical procedure. The procedure will ignore all other variables in the data set. A missing VAR statement implies that the procedure will use all numeric variables in the analysis.

Form: VAR variable list;

The BY Statement

The BY statement generates separate analysis for every combination of values of the variables specified on the BY statement. That is, the BY statement allows the programmer to perform sub-analysis on the data. The only restriction is that the data set must be sorted by the values of the variables used in the BY statement. The order of the variables on the BY statement must match the order of the variables in the BY statement of the SORT procedure.

Form: BY variable list;

The TITLE Statement

This statement defines a header for each output page. The programmer can define up to 9 separate titles. However, due to paging consideration no more than three titles seem to adequately fit on a page. Entering a blank title description removes an existing title.

Form: TITLEn ‘description’; (n is the title number).

The LABEL Statement

The LABEL statement allows the programmer to provide a 40 character description to the variable. This statement can also be used in a data step. Remember if used in the data step the labels get permanently assigned to the variable. If the statement is used in the procedure step the labels are valid only for the duration of the procedure.

Form: LABEL variable1 = ‘description’
variable 2 = ‘description’
…;

The FORMAT Statement

The FORMAT statement allows the programmer to assign special formats to the values of the variables. This statement can be used both in the data step and the procedure step. When formats are assigned to the variables in the data step the variables are permanently formatted in that way. However, the procedure step allows for temporary formatting of the variables.

Form: FORMAT variable1 format1. variable2 format2. … ;

The OUTPUT Statement

This is an optional statement used in several procedures. This statement requests SAS to output statistics to a new SAS data set. Multiple OUTPUT statements can be used in the same procedure as long as different output data sets are created.

Form: OUTPUT OUT=sasdataset keyword=names;

Where: Keyword is any of the options specific to a procedure
and name(s) would name the new variable(s) containing the statistics.

Part 2: Common Procedures

The SORT Procedure
The SORT procedure is used to:
* sort the observations in a SAS data set in order.
* create a data set containing rearranged observations.

By default the variables are sorted in ascending order. It is possible to specify a descending sort for any variable by placing the keyword DESCENDING before the name of the variable.

The BY statement must be used with the SORT procedure.

It is possible to specifiy the name of the input and output datasets. In the event that the output dataset name is not specified the input dataset will be overwritten with the sorted version.

Form: PROC SORT DATA=sasdataset OUT=sasdataset;
BY variable list;

Example: PROC SORT DATA=Census;
BY Region State;

Result: The CENSUS data will be sorted by the values of REGION and the sorted by STATE withing each REGION.

The PRINT Procedure
The PRINT procedure is used to print obervations in a SAS data set. The features of the PRINT procedure are as follows:

* automatic formatting
* columns labeled with variable names
* special handling of page breaks
* optimization of page space
* subgroup printing

Form: PROC PRINT DATA=sasdataset;

Other Statements: TITLE, VAR, BY, etc.;

The FORMAT Procedure

The FORMAT procedure allows you to create formats to your own specifications. These formats can then be used with a FORMAT statement in either the DATA or PROC steps. Three optional keywords, LOW, HIGH, and OTHER are avaiable. Further, the format procedures requires either the VALUE or the PICTURE statement to be used.

Form: PROC FORMAT;

The VALUE Statement

The statement is used to generate formats that associate labels with values.

Form: VALUE formatname
range1 = ‘label description’
range2 = ‘label description’
… ;

Example PROC FORMAT;
VALUE Grade
90-HIGH = ‘A’
80-89 = ‘B’
70-79 = ‘C’
LOW-69 = ‘F’;

The PICTURE Statement

The function of the PICTURE statement is to create a format that defines how to convert a numeric value into a character string when the value is printed.

Form: PICTURE picturename
range1 = ‘label description’
range2 = ‘label description’
… ;

Example PROC FORMAT;
PICTURE Phone
OTHER = ‘(000) 000-0000’;

Using FORMAT In A Procedure Step

Example: PROC PRINT;
VAR Name Tele Marks;
FORMAT Tele Phone. Marks Grade.;

Result: The values of TELE when printed will be formatted as (000) 000-0000 and the values of MARKS will be in the letter grade format. The formats will be effect only for the procedure PRINT.

Using FORMAT In A DATA Step

Example: DATA A;
INPUT Name Tele Marks;
FORMAT Tele Phone. Marks Grade.;

Result: The values of TELE will be formatted as (000) 000-0000 and the values of MARKS will be in the letter grade format. The formats will be effect for all future references of TELE and MARKS.

The MEANS Procedure
This procedure produces simple univariate descriptive statistics for numeric variables.

Form: PROC MEANS DATA=sasdataset options;
Options: NOPRINT N MEAN STD MAX MIN SUM VAR etc.

Example: PROC MEANS DATA=Exams MEAN SUM N;
VAR Test1 Test2 Test3;
BY Sex;
LABEL Test1 = ‘Creativity Test’
Test2 = ‘Abstract Reasoning’
Test3 = ‘Mechanical Reasoning’;
OUTPUT OUT=Stat MEAN=MeanTst1 MeanTst2 MeanTst3;

Result: Two separate analyses, one for males and the other for females is performed on variables TEST1, TEST2, and TEST3. The statistics presented are: mean, sum, and number of valid responses. An output data set STAT is created. STAT will contain the mean scores for the three variables. The output from the MEAN procedure will label the three test variables appropriately.

The FREQ Procedure
The FREQ procedure produces one-way to n-way frequency and cross-tabulation tables.

Form: PROC FREQ DATA=sasdataset options;

The TABLES Statement

For each frequency or cross-tabulation table a TABLES statement must be specified. It is principally used for categorical data analysis.

Form: TABLES request / options;

Request: Some examples of request are:
TABLES (A B C)* D;
TABLES (A B) * (C D);
TABLES A*B*C

Where A, B, C, and D are variables.

Options: CHISQ NOCOL NOCUM NOPERCENT etc.

Example: PROC FREQ DATA=Survey;
TABLES Sex * Schsize / CHISQ;

Result: The two variables, SEX and SCHSIZE will be crosstabulated. In addition to the default statistics the analysis will report the overall value of the chi-square test statistic and its significance value. The variable SEX will be on the vertical side of the table and the variable SCHSIZE will be on the horizontal size of the table.

The PLOT Procedure
The Plot procedure is used to graph variables in an xy-plane. Generally, use this procedure with continuous data. The graph is in the form of dots for every xy-coordinate. This procedure does not produce presentation quality graphs. The purpose of this procedure is to provide the analyst with a quick view of the data. For presentation graphs you should use the SAS/GRAPH product.

Form: PROC PLOT DATA=sasdataset;
The PLOT Statement
Include at least one PLOT statement in the PLOT procedure. The PLOT procedure uses only those variables listed on this statement.

Form: PLOT y-axis variable name * x-axis variable name / options;
Options: HREF = list of values at which to draw vertical reference lines
VREF = list of values at which to draw horizontal reference lines
VAXIS = list of values for tick marks on y-axis
HAXIS = list of values for tick marks on x-axis
OVERLAY overlays multiple plots on a single set of axes

Example: PROC PLOT Data=Bank;
PLOT Rate1 * Year Rate2 * Year / OVERLAY;