Open3DQSAR's workflow always begins importing a set of molecular structures with the respective biological activities through the
import type=DEPENDENTkeywords. Afterwards, one may calculate MIFs through the
calc_fieldkeyword, import them from different sources through the
importkeyword, or a combination of the two.
Once all MIFs have been gathered, Open3DQSAR allows to include all of them or just a selection, in order to evaluate their impact on the model. A choice of the objects to be included in the model can also be made, especially regarding the opportunity to include them in the training set or in an external test set. The latter option will make possible to accomplish external predictions once a model has been obtained.
Open3DQSAR can perform a variety of chemometric analyses on imported MIFs, ranging from standard variable pretreatment to more advanced variable selection procedures.
Available pretreatment operations include:
- zeroing (sets to zero grid values
which are close to zero)
- max/min cut-off (sets to user-defined maximum/minimum threshold values the grid points lying respectively above or below these boundaries)
- exclusion of grid points which exceed the cutoff in a specific MIF (e.g., allows to exclude from the chemometric analysis the grid points which are very close to atom nuclei and therefore assume high steric energy values)
- standard deviation cut-off (removes variables having a standard deviation among different objects lower than a user-defined threshold, in order to improve the signal-to-noise ratio)
- N-level variable elimination (removes variables assuming only a few different values across the different objects to prevent them from biasing the model)
- scaling operations (autoscaling, scaling of a whole block of X or Y variables by a user-defined coefficient, or according to the Block Unscaled Weighting procedure ).
Subsequently, once an initial PLS model has been obtained, one can challenge its predictive performance against an external test set or by internal cross-validation, using the leave-one-out, leave-two-out and leave-many-out paradigms. Furthermore, the robustness of the model can be ascertained through the progressive scrambling procedure previously described by Clark and Fox .
The predictive power of a model can usually be improved by applying appropriate variable selection procedures. A number of them have been implemented in Open3DQSAR, namely:
- Smart Region Definition (SRD), as previously described by Pastor and co-workers . SRD groups variables on the basis of their original localization in three-dimensional space; this procedure reduces redundancy arising from the existence of multiple nearby descriptors which basically encode the same kind of information
- Fractional factorial design (FFD)
variable selection, as originally described by Baroni et al. and
implemented in GOLPE
[4, 6]. FFD selection aims at
selecting the variables which have the largest effect on predictivity,
and can operate on both single variables or on groups identified by a
previous SRD run
- UVE-PLS variable selection as originally described by Centner and co-workers , as well as the modified iterative IVE-PLS methodology developed by Polanski and colleagues . These procedures remove the least informative variables, i.e., those characterized by small PLS pseudo-coefficients. The Open3DQSAR implementation of UVE/IVE-PLS has been further augmented including the possibility to use other cross-validation paradigms in addition to the leave-one-out scheme originally proposed by Centner, as recently suggested by Grohmann and Schindler . Additionally, the algorithms can operate on both single variables or SRD groups, just as for FFD selection
- Kastenholz, M. A.; Pastor, M.; Cruciani, G.; Haaksma, E. E. J.; Fox, T. J. Med. Chem. 2000, 43, 3033-3044. DOI
- Clark, R. D.; Fox, P. C.
J. Comput.-Aided Mol. Des. 2004,
- De Aguiar, P. F.; Bourguignon, B.; Khots, M. S.; Massart, D. L.; Phan-Than-Luu R. Chemometrics Intell. Lab. Syst. 1995, 30, 199-210. DOI
- Baroni, M.; Costantino, G.; Cruciani, G.; Riganelli, D.; Valigi, R.; Clementi, S. Quant. Struct-Act. Relat. 1993, 12, 9-20. DOI
- Pastor, M.; Cruciani, G.; Clementi, S. J. Med. Chem. 1997, 40, 1455-1464. DOI
- Baroni, M.; Clementi, S.; Cruciani, G.; Costantino, G.; Riganelli, D. J. Chemometr. 1992, 6, 347-356. DOI
- Centner, V.; Massart, D. L.; de Noord, O. E.; de Jong, S.; Vandeginste, B. M.; Sterna, C. Anal. Chem. 1996, 68, 3851-3858. DOI
- Gieleciak, R.; Polanski, J. J. Chem. Inf. Model. 2007, 47, 547-556. DOI
- Grohmann, R.; Schindler, T. J. Comput. Chem. 2008, 29, 847-860. DOI