### ffdsel

#### SYNOPSIS

`ffdsel [type={LOO | LTO | LMO}; defaults to LOO] \`

[runs=<number of runs; defaults to 20>] \

[groups=<number of groups; defaults to 5] | EXTERNAL} \

[pc=<number of PCs; defaults to the number of PCs of the current PLS model>] \

[percent_dummies=<0-50; defaults to 20>] \

[use_srd_groups={YES | NO; defaults to NO}] \

[retain_uncertain={YES | NO; defaults to YES}] \

[fold_over={YES | NO; defaults to NO}] \

[combination_variable_ratio=<1.0-10.0; defaults to 2.0>] \

[confidence_level=<80.0-99.0; defaults to 99.0>] \

[print_sdep={YES | NO; defaults to NO} \

[print_effect={YES | NO; defaults to NO}]

#### DESCRIPTION

The`ffdsel`

keyword is used to carry out a variable selection
according to Fractional Factorial Design (FFD), as implemented in
GOLPE
[2,
4]. The rationale of this procedure is to
select a subset of variables whose impact on cross-validated
*q*is favorable. The user has control on a number of parameters, first of all the number of

^{2}`pc`

and the `type`

of cross-validation which will be used in the
variable selection procedure. In **Open3DQSAR**an additional possibility has been implemented, that is using external validation as an alternative to internal cross-validation to carry out the variable selection (

`type=external`

). Setting `type=external`

,
the subset of variables having the most favorable impact
on the SDEP of an external test set is selected by the FFD
procedure. All of the parameters controlling the FFD variable
selection are those defined by Baroni et al. in their original
implementation:`percent_dummies`

: percentage of dummy variables which should be included in the FFD matrix

`use_srd_groups`

: a flag which specifies whether single variables will be included in the models or, instead, the groups of variables identified by a SRD procedure carried out previously

`retain_uncertain`

: a flag which specifies whether variables having an uncertain effect on predictivity at the chosen confidence level should be retained or excluded from the model

`fold_over`

: a flag which specifies whether a fold-over type of FFD should be used (refer to [2] or [5] for further details)

`combination_variable_ratio`

: this parameter controls how many models will be built using different combinations of variables; in particular, the total number of PLS models is equal to the power of two nearest to the number of active variables (or active SRD groups, if the`use_srd_groups`

flag was set) times the`combination_variable_ratio`

coefficient. Of course, the higher the number of tested models, the most accurate will be the choice of the subset of variables achieving the best predictive performance

`confidence_level`

: the confidence level used in considering a variable as favorable, detrimental or uncertain with respect to the predictive power of the model

`print_sdep`

: a flag to toggle verbose output of the SDEP values of all the individual models

`print_effect`

: a flag to toggle verbose output of the effects, computed by the Yates algorithm, of each active variable

`ffdsel`

module operates in parallel fashion on multiprocessor
machines, using all the CPUs available in the system; if one wishes to
run the computation on a smaller number of CPUs, this may be specified
with the `env n_cpus`

keyword before calling `ffdsel`

.
#### EXAMPLES

```
# this command invokes FFD selection using LOO cross-validation, extracting 3 principal
components. 20% Dummy variables are included in the FFD matrix and a
2.0 combination/variable ratio is used; this means that if the model has
2560 active variables, 2.0 * 4096 = 8192 models will be evaluated. The
number of CPU cores previously set by the env n_cpus keyword is used.
```

ffdsel pc=3 type=LOO percent_dummies=20 combination_variable_ratio=2.0

# this command invokes FFD selection using LMO cross-validation (5 groups,
100 runs), extracting 5 principal components. 20% Dummy variables are
included in the FFD matrix and a 1.0 combination/variable ratio is
used. SRD groups previously computed will be taken into account instead
of single variables. 2 CPU cores are used.

env n_cpus=2

ffdsel pc=5 type=LMO groups=5 runs=100 \

percent_dummies=20 combination_variable_ratio=1.0 \

use_srd_groups=yes

# this command invokes FFD selection using LMO cross-validation (4 groups,
50 runs), extracting 5 principal components. 10% Dummy variables are
included in the FFD matrix and a 2.0 combination/variable ratio is
used. SRD groups previously computed will be taken into account instead
of single variables. Fold-over design is chosen and uncertain variables
are removed. Full details about SDEPs and effects of individual models
are printed to the main output. 4 CPU cores are used.

env n_cpus=4

ffdsel pc=5 type=LMO groups=4 runs=50 \

percent_dummies=10 combination_variable_ratio=2.0
use_srd_groups=yes \

fold_over=yes print_sdep=yes print_effect=yes

#### REFERENCES

- De Aguiar, P. F.; Bourguignon, B.;
Khots, M. S.; Massart, D. L.; Phan-Than-Luu R.
*Chemometrics Intell. Lab. Syst.***1995**,*30*, 199-210. DOI - Baroni, M.; Costantino, G.; Cruciani,
G.; Riganelli, D.; Valigi, R.; Clementi, S.
*Quant. Struct-Act. Relat.***1993**,*12*, 9-20. DOI - Johnson, M. E.; Nachtsheim, C. J.
*Technometrics***1983**,*25*, 271-277. Stable URL - Baroni, M.; Clementi, S.; Cruciani, G.; Costantino, G.;
Riganelli, D.
*J. Chemometr.***1992**,*6*, 347-356. DOI - Box, G. E. P.; Hunter, J. S.; Hunter, W. G.
*Statistics for Experimenters: Design, Innovation, and Discovery, 2nd ed.***2005**, Wiley-VCH, Weinheim.