ffdsel


SYNOPSIS

ffdsel  [type={LOO | LTO | LMO}; defaults to LOO]  \
     [runs=<number of runs; defaults to 20>]  \
     [groups=<number of groups; defaults to 5] | EXTERNAL}  \
    [pc=<number of PCs; defaults to the number of PCs of the current PLS model>]  \
    [percent_dummies=<0-50; defaults to 20>]  \
    [use_srd_groups={YES | NO; defaults to NO}]  \
    [retain_uncertain={YES | NO; defaults to YES}]  \
    [fold_over={YES | NO; defaults to NO}]  \
    [combination_variable_ratio=<1.0-10.0; defaults to 2.0>]  \
    [confidence_level=<80.0-99.0; defaults to 99.0>]  \
    [print_sdep={YES | NO; defaults to NO}  \
    [print_effect={YES | NO; defaults to NO}]


DESCRIPTION

The ffdsel keyword is used to carry out a variable selection according to Fractional Factorial Design (FFD), as implemented in GOLPE [2, 4]. The rationale of this procedure is to select a subset of variables whose impact on cross-validated q2 is favorable. The user has control on a number of parameters, first of all the number of pc and the type of cross-validation which will be used in the variable selection procedure. In Open3DQSAR an additional possibility has been implemented, that is using external validation as an alternative to internal cross-validation to carry out the variable selection (type=external). Setting type=external, the subset of variables having the most favorable impact on the SDEP of an external test set is selected by the FFD procedure. All of the parameters controlling the FFD variable selection are those defined by Baroni et al. in their original implementation:

By default, the ffdsel module operates in parallel fashion on multiprocessor machines, using all the CPUs available in the system; if one wishes to run the computation on a smaller number of CPUs, this may be specified with the env n_cpus keyword before calling ffdsel.

EXAMPLES

# this command invokes FFD selection using LOO cross-validation, extracting 3 principal components. 20% Dummy variables are included in the FFD matrix and a 2.0 combination/variable ratio is used; this means that if the model has 2560 active variables, 2.0 * 4096 = 8192 models will be evaluated. The number of CPU cores previously set by the env n_cpus keyword is used.
ffdsel  pc=3  type=LOO  percent_dummies=20  combination_variable_ratio=2.0

# this command invokes FFD selection using LMO cross-validation (5 groups, 100 runs), extracting 5 principal components. 20% Dummy variables are included in the FFD matrix and a 1.0 combination/variable ratio is used. SRD groups previously computed will be taken into account instead of single variables. 2 CPU cores are used.
env  n_cpus=2
ffdsel  pc=5  type=LMO  groups=5  runs=100  \
    percent_dummies=20  combination_variable_ratio=1.0  \
    use_srd_groups=yes 

# this command invokes FFD selection using LMO cross-validation (4 groups, 50 runs), extracting 5 principal components. 10% Dummy variables are included in the FFD matrix and a 2.0 combination/variable ratio is used. SRD groups previously computed will be taken into account instead of single variables. Fold-over design is chosen and uncertain variables are removed. Full details about SDEPs and effects of individual models are printed to the main output. 4 CPU cores are used.
env  n_cpus=4
ffdsel  pc=5  type=LMO  groups=4  runs=50  \
    percent_dummies=10  combination_variable_ratio=2.0  use_srd_groups=yes  \
    fold_over=yes  print_sdep=yes  print_effect=yes


REFERENCES

  1. De Aguiar, P. F.; Bourguignon, B.; Khots, M. S.; Massart, D. L.; Phan-Than-Luu R. Chemometrics Intell. Lab. Syst. 1995, 30, 199-210.   DOI
  2. Baroni, M.; Costantino, G.; Cruciani, G.; Riganelli, D.; Valigi, R.; Clementi, S. Quant. Struct-Act. Relat. 1993, 12, 9-20.   DOI
  3. Johnson, M. E.; Nachtsheim, C. J. Technometrics 1983, 25, 271-277.   Stable URL
  4. Baroni, M.; Clementi, S.; Cruciani, G.; Costantino, G.; Riganelli, D. J. Chemometr. 1992, 6, 347-356.   DOI
  5. Box, G. E. P.; Hunter, J. S.; Hunter, W. G. Statistics for Experimenters: Design, Innovation, and Discovery, 2nd ed. 2005, Wiley-VCH, Weinheim.

Sitemap
Print version
Contact
Mailing list


Last update:
May 31. 2015 20:39:42

Powered by
CMSimple - CMSimple-Styles


Get Open3DGRID at SourceForge.net. Fast, secure and Free Open Source software downloads



Would you like to align your
dataset? Try Open3DALIGN
Just wish to compute a MIF?
Try Open3DGRID