[pc=<number of PCs; defaults to the number of PCs of the current PLS
defaults to 50> \
design_points=<0.1 * number of variables - (number of variables
- 1); \
defaults to 0.5 * number of
LOADINGS; defaults to WEIGHTS}]
keyword is used to carry out a variable
selection according to D-optimal design, i.e.
to select an ensemble of variables (whose size
is defined by the user) such as to minimize the determinant of the
dispersion matrix. The D-optimal algorithm can operate in either the
space of PLS partial weights (
) or PLS loadings
). An excellent tutorial to D-optimal
design has been written by De Aguiar et al. [1
while to review its application in the 3D-QSAR field one may refer to
the work by Baroni et al. [2
]. In Open3DQSAR
implementation the D-optimal design is obtained by means of a
]. The user has two choices to specify the
extent to which the variable selection should be carried out:
one is supplying the
option with the
percent of variables which should be eliminated as a parameter. The
other is supplying the exact number of variables which should be
retained as a parameter to the option
both cases one is not allowed to remove more than 90% of the original
# the following command performs
a D-optimal variable selection operating in the space of PLS partial
weights, taking into account the first 3 principal components, with
the aim of removing 40% of the original variables
type=WEIGHTS pc=3 percent_remove=40
# the following
command performs a D-optimal variable selection operating in the space
of PLS loadings, taking into account the same number of principal
components extracted when the PLS model was built, with the aim of
retaining 1500 variables
- De Aguiar, P. F.; Bourguignon, B.; Khots, M. S.; Massart,
D. L.; Phan-Than-Luu R. Chemometrics Intell. Lab. Syst. 1995,
- Baroni, M.; Costantino, G.; Cruciani,
G.; Riganelli, D.; Valigi, R.; Clementi, S.
Quant. Struct-Act. Relat.
1993, 12, 9-20.
- Johnson, M. E.; Nachtsheim, C. J.
1983, 25, 271-277.