TURF.OPTIONS
This helpfile describes the various identifiers within
P-STAT's TURF command.
General identifiers
- TURF xxx
this supplies the input filename.
Except for an optional weight variable,
all variables are treated as analysis items.
The values on the analysis items should
be zeros or positive numbers.
A positive value signifies a "hit".
Cases with any missing or negative values on
the analysis items are ignored.
The SET.MISS.TO.ZERO identifier, described
below, sets missing analysis items to zeros.
When case weighting is being used, any cases
with a missing, negative or zero value on
the weight variable are also ignored.
- SIZE 6,
SIZE 4 to 7,
SIZE 6 to 3,
SIZE 4 6 8,
-
what size combinations to use. required.
One or more sizes can be done in one run.
They are done in the order given;
for SIZE 6 to 4, size 6 is done first.
The final report shows the result for each
size separately. The output files show the
best results from the first size, then the
second, and so forth.
Many (20 or more) sizes can be done in a run;
each size must be from 1 to 60, and there
should not be any repeated sizes in a run.
Note...some sizes cannot be run in a
reasonable amount of time. Consider 40 items.
Depending on number of cases and on options:
Size 4 takes 91,390 iterations. Seconds.
Size 6 takes 3.8 million iterations. Minutes.
Size 10 takes 847 million iterations. An hour.
Size 15 takes 40 billion iterations. A day.
Size 20 takes 137 billion iterations. A week.
This command produced the above numbers.
DO #j = 1, 20;
PUT #j (combinations( 40,#j));
ENDDO $
The F2 key can be used to cause a TURF command
to abandon the current size being processed.
It will produce the report and the output
files for the sizes already completed.
-
REACH.THRESHOLD 2 -
is optional. can be fractional.
This permits the user to control
what constitutes a successful "reach".
The default is one; if a case has a positive
response on any of the items in a given
combination, that case is added to the reach
total for that set of items.
Using REACH.THRESHOLD 3, for example, means
a case needs a reach score of 3 or more
to have been reached on a given group.
Having several responses increases a case's
reach score; weighting of either items or
responses can also affect the reach score.
-
PROGRESS 5
is optional. controls how often the progress
window or report line is updated.
The default is 1, which means every million
combinations. PROGRESS 0 turns it off.
-
SET.MISS.TO.ZERO
- is optional. If used, missing analysis
values in the input file are set to zeros.
If needed, this saves having to write
some PPL as the file is read.
Identifiers that control the makeup
of the combinations to be used
-
USE list-of-vars min max
-
is optional. This provides a limitation on the
makeup of the combinations to be tried.
Of the variables whose names (or ranges)
follow USE, at least MIN of them and
at most MAX of them should be in every
combination that will be tried.
The MIN value can be zero.
Up to 30 such USE phrases can be given.
Combinations are used only if they pass the
constraints in every one of the USE phrases.
Each use of USE is followed by:
-
The names of the variables in the group.
Ranges, like TOPPING.1 TO TOPPING.8,
can be used.
-
The smallest number of those variables
that are required. Can be zero.
A combination must have AT LEAST that
many of the variables in the group.
-
The largest number of those variables
that may be used.
A combination may have AT MOST that many
of the variables in the group.
All of the group could be used if the
supplied number is equal to or larger
then the size of the group. Therefore,
using 999 is a vivid way of saying there
is no upper limit for the group.
For example:
TURF xxx, size 8,
use aaa bbb to ddd 1 999,
use eee to ggg jjj to mmm 2 4,
use yyy zzz 0 1 $
In the above command, the only combinations
that will be evaluated are those that have
at least one variable from the first group, and
at least two but no more than four variables
from the second group, and
no more than one variable from the third group.
-
FORCE vars
-
is optional. names or ranges of items that
should be part of every combination.
Suppose there are 30 items and size is 6;
without force, 593,775 combinations are done,
because we take 30 items 6 at a time.
If 2 items are forced, only 20,475
combinations will be done because the run
reduces to 28 items taken 4 at a time.
If size is 6 and all 6 items are forced,
just that one pass will be done.
Identifiers for various
kinds of weighting
-
CASE.WEIGHTS varname
is optional.
The named variable will be used as a
caseweight, and not as an analysis item.
-
ITEM.WEIGHTS filename
-
the default is treat all of the items
the same, i.e., with weights of 1.
When ITEM.WEIGHTS is used, it should
be followed by the name of a P-STAT system
file which itself has exactly 2 variables.
In each record, the first variable has the
name of a item being used for the TURF
analysis, the second is the weight to be used
for that item. The first variable is therefore
character, and the second is numeric.
The file is not required to have a record
for every item. In other words, some items
can be given changed weights; others can
be left as is ( i.e., still set to 1).
The file can have names and weights for
items not used in the current run; if so,
they are ignored.
-
RESPONSE.WEIGHTS
-
the default is to store the input data
as zeros or ones, with one meaning a yes.
This option leaves the input values intact;
they should be in zero (no) or a positive
value (not necessarily an integer) to show
the INTENSITY of a yes.
REACH.RESULTS and
FREQ.RESULTS
The following five identifiers are fully described in
TURF results.
-
REACH.RESULTS rrr
-
optional output P-STAT system file.
This file holds the combinations with the
best REACH values.
-
REACH.DETAILS cumulative.pct
This controls which lines are printed in the
REACH.RESULTS file to show the importance of
the items in a combination.
-
FREQ.RESULTS fff
optional output P-STAT system file.
This file holds the combinations with the
best FREQ values.
-
FREQ.DETAILS cumulative.pct
This controls which lines are printed in the
FREQ.RESULTS file to show the importance of
the items in a combination.
-
OMIT size pct.of.max.reach
This provides a way to drop some of the
statistics in the REACH.RESULTS and
FREQ.RESULTS files.
Other optional output file identifiers
The following four identifiers are fully described in
TURF files.
-
REACH.SUMMARY qqq
-
optional output P-STAT system file.
This file shows how many combinations
had each of the reach values that were found.
-
FREQ.SUMMARY qqq
optional output P-STAT system file.
This file shows how many combinations
had each of the freq values that were found.
FULL.OUTPUT fff
optional output P-STAT system file.
This has the results of ALL combinations
in the order that they were processed.
This should only be used in very small runs.
-
TEMPLATE ttt
optional output P-STAT system file.
This contains the names of the items
that comprised the best combination.
It is intended for the TURF.SCORES command.