Aids for Survey Analysis


The SURVEY command produces tables: 1-way and multi-way with extensive capabilities for labelling and controlling the content and appearance of the final result.

The labels for the tables are supplied in titles and by the use of one or more files of labels. A labels file can contain extended labels for the variables or fields in the file, labels for the values of those variables and strings to be substituted for program supplied labels such as "mean" or "totals". Individual labels can be up to 80 characters. Multiple extended labels can be supplied so that an entire question can be printed as part of the table.

Parts of a single table can be identified and moved or omitted. Thus a table can have the totals at the top or at the bottom or left out completely. Row totals can be placed on the left, or the right. Subtotals or nets can be grouped together before or after the body or interleaved appropriately.

A wide variety of statistics can be requested including means, medians, chi-square, F-tests, t-tests and significance tests for the cells.

The final result can be printed to disk, to the screen, or to a printer or print queue. PostScript controls can be given with simple key words to produce camera-ready copy with attractive fonts.

The BALANCE command does sample balancing so that the marginals and totals in tables match those of the population or other control group whose marginals and totals are supplied.

Weights for each unique combination of the balance variables (that is for each cell defined by the balance variables) are computed. Output files of the actual and adjusted cell frequencies, the observed and adjusted marginals and the input data plus the generated weightes may be requested. The original input file may itself be weighted.


The SAMPLE command selects a random sample of a specified number of cases from an input file. A percent of the file can be requested instead of a specific number of cases.

The command uses random numbers whose sequence is initialized uniquely each time it is begun. Therefore, a different sample will most likely be selected even though the command is reexecuted without change.

BY variables can be supplied to define subgroups within the file. If used, each such subgroup will be sampled to the same extent as the file as a whole.

For example, if 40 cases have the same values as each other on BY variables AGE and REGION, a 25% sampling will select exactly 10 of them. A group of 41 would cause 11 to be selected because the default, given 10,25, is to use the next integer (a CEILING function); therefore the number of selected cases in the sample will tend to be a few more than the requested number. If BY is not used, the entire file is treated as one group.