### scramble

#### SYNOPSIS

`scramble [type={LOO | LTO | LMO}; defaults to LOO] \`

[runs=<number of runs; defaults to 20>] \

[groups=<number of groups; defaults to 5>]} \

[pc=<number of PCs; defaults to the number of PCs
of the current PLS model>] \

[max_bins=<maximum number of starting bins into which objects are divided;
defaults to 1/3 of active objects>] \

[min_bins=<minimum number of ending bins into which objects are divided;
defaults to 2>] \

[scramblings=<number of times object are
shuffled at each binning level>] \

[fit_order=<2 | 3; defaults to 3>] \

[critical_point=<r^{2}(yy') value at which the fitted
q^{2} or SE(cv) values are calculated; defaults to 0.85>] \

[print_runs={YES | NO; defaults to NO}]

#### DESCRIPTION

The`scramble`

keyword is used to challenge the robustness of a model by progressive
scrambling of *Y*responses as proposed by Clark and Fox [1]. Objects are sorted according to decreasing

*Y*value (the average of

*Y*values if multiple dependent variables are present), then grouped into bins according to the value of the

`max_bins`

parameter;
by default, a number of bins such that in each bin at least three
objects are included is chosen. Subsequently,
*Y*values are scrambled inside each bin a number of times controlled by the

`scramblings`

parameter, and for
each scrambling a PLS and a CV model are computed according to the
values of the `pc`

, `type`

, `groups`

and `runs`

parameters. At the end of each CV run,
cross-validated *q*and SE(cv) are computed according to the following equations (see [1] for details):

^{2}`q`^{2}
= 1 - ∑(y'_{exp} - y'_{pred})^{2}
/ ∑(y'_{exp}^{2})

SE(cv) =
[∑(y'_{exp} - y'_{pred})^{2} / (N - pc
- 1)]^{1/2}

After

`scramblings`

runs

have been carried out, the number of bins is decreased by one until
the `min_bins`

values is reached (default, 2) and the
PLS/CV computation is repeated. When the iterative process has
completed, `(max_bins - min_bins + 1) * scramblings`

*q*and SE(cv) values (see [1] for details) have been computed and stored, one for each of the CV models. These values are fitted by a second or third (the default) order polynomial as determined by the

^{2}`fit_order`

parameter against
*r*(

^{2}*y*-

*y*') values to obtain

*q*and SE(cv) values corresponding to a critical value of

^{2}*r*(

^{2}*y*-

*y*') determined by the

`critical_point`

parameter (default, 0.85). The latter values are an indicator of the
robustness of the model against scrambling of *Y*responses. If the

`print_runs`

parameter is set to `YES`

, *q*and SE(cv) values for the individual CV runs are printed on the main output. The plots of

^{2}*q*and SE(cv) fitted against

^{2}*r*(

^{2}*y*-

*y*') can be obtained by the

```
plot
type={SCRAMBLED_Q2_VS_R2 | SCRAMBLED_SECV_VS_R2}
```

keyword.#### EXAMPLE

```
# the following command performs
10 scrambling runs at each binning level using LMO cross-validation (5
principal components, 5 groups, 100 runs)
```

scramble pc=5
type=LMO groups=5 runs=20 scramblings=10

#
the following command performs 20 scrambling runs at each binning
level using LOO cross-validation, extracting 3 principal components,
choosing verbose output, a 2nd order polynomial fit (5 groups,
100 runs) and a 0.8 critical point value

scramble pc=3
type=LOO scramblings=20 print_runs=Y fit_order=2
critical_point=0.8

#### REFERENCES

- Clark, R. D.; Fox, P. C.
*J. Comput.-Aided Mol. Des.***2004**,*18*, 563-576. DOI