cv  [type={LOO | LTO | LMO}; defaults to LOO]  \
    [runs=<number of runs; defaults to 20>]  \
    [groups=number of groups; defaults to 5]}  \
    [pc=<number of PCs, defaults to the number of PCs of the current PLS model>]  \
    [file=<filename.sdf where results will be saved in SDF format>]


The cv keyword is used to perform a cross-validation run once a PLS model has been obtained. The type keyword allows to choose the cross-validation scheme (LOO, leave-one-out; LTO, leave-two-out; LMO, leave-many-out). In the case of LMO cross-validation it is possible to define the number of runs to be carried out as well as the number of groups in which the dataset will be split for cross-validation (e.g., 10 groups corresponds to leave-10%-out, 5 groups to leave-20%-out, etc.). The pc keyword allows to select the number of PCs which will be used during cross-validation; by default, the same number of principal components extracted when the PLS model was built is used, but a lower number may be chosen as well; an error message will be issued if a larger number of PCs with respect to the current PLS model is chosen. CV statistics (SDEP, q2) together with predicted values as a function of the number of PCs are printed on the main output, and can subsequently be plotted through the plot command. Additionally, if the file parameter is specified, a file with the predicted values is generated in SDF format, ready to be imported in a molecular modeling software. By default, the cv module operates in parallel fashion on multiprocessor machines, using all the CPU cores available in the system; this may be specified before calling cv with the env n_cpus keyword.


# the following command performs a leave-one-out cross-validation run extracting 5 principal components using the number of CPUs previously set with env n_cpus
cv  pc=5  type=LOO

# the following command performs a leave-two-out cross-validation run extracting 5 principal components using 2 CPU cores, saving results in the file results.sdf
env  n_cpus=2
cv  pc=5  type=LTO  file=results.sdf

# the following command performs a leave-many-out cross-validation run extracting 3 principal components, after splitting the dataset into 4 groups. 50 runs will be carried out, each with a different random group composition; 4 CPU cores will be used
env  n_cpus=4
cv  pc=3  type=LMO  groups=4  runs=50

Print version
Mailing list

Last update:
May 31. 2015 20:39:42

Powered by
CMSimple - CMSimple-Styles

Get Open3DGRID at Fast, secure and Free Open Source software downloads

Would you like to align your
dataset? Try Open3DALIGN
Just wish to compute a MIF?
Try Open3DGRID