scramble


SYNOPSIS

scramble  [type={LOO | LTO | LMO}; defaults to LOO]  \
    [runs=<number of runs; defaults to 20>]  \
    [groups=<number of groups; defaults to 5>]}  \
    [pc=<number of PCs; defaults to the number of PCs of the current PLS model>]  \
    [max_bins=<maximum number of starting bins into which objects are divided; defaults to 1/3 of active objects>]  \
    [min_bins=<minimum number of ending bins into which objects are divided; defaults to 2>]  \
    [scramblings=<number of times object are shuffled at each binning level>]  \
    [fit_order=<2 | 3; defaults to 3>]  \
    [critical_point=<r2(yy') value at which the fitted q2 or SE(cv) values are calculated; defaults to 0.85>]  \
    [print_runs={YES | NO; defaults to NO}]


DESCRIPTION

The scramble keyword is used to challenge the robustness of a model by progressive scrambling of Y responses as proposed by Clark and Fox [1]. Objects are sorted according to decreasing Y value (the average of Y values if multiple dependent variables are present), then grouped into bins according to the value of the max_bins parameter; by default, a number of bins such that in each bin at least three objects are included is chosen. Subsequently, Y values are scrambled inside each bin a number of times controlled by the scramblings parameter, and for each scrambling a PLS and a CV model are computed according to the values of the pc, type, groups and runs parameters. At the end of each CV run, cross-validated q2 and SE(cv) are computed according to the following equations (see [1] for details):

q2 = 1 - ∑(y'exp - y'pred)2 / ∑(y'exp2)

SE(cv) = [∑(y'exp - y'pred)2 / (N - pc - 1)]1/2


After scramblings runs have been carried out, the number of bins is decreased by one until the min_bins values is reached (default, 2) and the PLS/CV computation is repeated. When the iterative process has completed, (max_bins - min_bins + 1) * scramblings q2 and SE(cv) values (see [1] for details) have been computed and stored, one for each of the CV models. These values are fitted by a second or third (the default) order polynomial as determined by the fit_order parameter against r2(y-y') values to obtain q2 and SE(cv) values corresponding to a critical value of r2(y-y') determined by the critical_point parameter (default, 0.85). The latter values are an indicator of the robustness of the model against scrambling of Y responses. If the print_runs parameter is set to YES, q2 and SE(cv) values for the individual CV runs are printed on the main output. The plots of q2 and SE(cv) fitted against r2 (y-y') can be obtained by the plot type={SCRAMBLED_Q2_VS_R2 | SCRAMBLED_SECV_VS_R2} keyword.

EXAMPLE

# the following command performs 10 scrambling runs at each binning level using LMO cross-validation (5 principal components, 5 groups, 100 runs)
scramble  pc=5  type=LMO  groups=5  runs=20  scramblings=10

# the following command performs 20 scrambling runs at each binning level using LOO cross-validation, extracting 3 principal components, choosing verbose output, a 2nd order polynomial fit  (5 groups, 100 runs) and a 0.8 critical point value
scramble  pc=3  type=LOO  scramblings=20  print_runs=Y  fit_order=2  critical_point=0.8


REFERENCES

  1. Clark, R. D.; Fox, P. C. J. Comput.-Aided Mol. Des. 2004, 18, 563-576.   DOI

Sitemap
Print version
Contact
Mailing list


Last update:
May 31. 2015 20:39:42

Powered by
CMSimple - CMSimple-Styles


Get Open3DGRID at SourceForge.net. Fast, secure and Free Open Source software downloads



Would you like to align your
dataset? Try Open3DALIGN
Just wish to compute a MIF?
Try Open3DGRID