New version of the MDR method for stratified samples

  • Alexander Bulinski Lomonosov Moscow State University, Russia
  • Alexey Kozhevin Lomonosov Moscow State University, Russia
Keywords: Feature selection, MDR method, Error function estimation, Cross-validation, Stratified sample, Cost approach

Abstract

The new version of the MDR method of performing identication of relevant factors within a given collection X_1,..., X_n is introduced for stratified samples in the case of binary response variable Y. We establish a criterion of strong consistency of estimates (involving K-cross-validation procedure and penalty) for a specified prediction error function. The cost approach is proposed to compare experiments with random and nonrandom number of observations. Analytic results are accompanied by simulations.

Author Biographies

Alexander Bulinski, Lomonosov Moscow State University, Russia
Faculty of Mathematics and Mechanics, Professor
Alexey Kozhevin, Lomonosov Moscow State University, Russia
Faculty of Mathematics and Mechanics, PhD student

References

S.E.Ahmed. Penalty, Shrinkage and Pretest Strategies. Variable Selection and Estimation. Springer, Cham, 2014.

V.Bolon-Canedo, N.Sanchez-Marono and A.Alonso-Betanzos. Feature Selection for High-Dimensional Data. Springer, Cham, 2015.

P.Buhlmann, S.van de Geer. Statistics for High-Dimensional Data. Methods, Theory and Applications. Springer, Heidelberg, 2011.

A. Bulinski. On foundation of the dimensionality reduction method for explanatory variables. Journal of Mathematical Sciences, v. 199, No. 2, 113-122 (2014).

A.Bulinski. Central limit theorem related to MDR-method. In: Asymptotic Laws and Methods in Stochastics. A volume in Hounor of Miklos Csorgo. Fields Institute Communications, v. 76, 113-128. Springer, New York, 2015.

A.Bulinski. Some statistical methods in genetics. In: V.Schmidt (Ed.). Stochastic Geometry, Spatial Statistics and Random Fields. Lecture Notes in Mathematics, v. 2120, 293-320. Springer-Verlag, Berlin, 2014.

A.Bulinski, O.Butkovsky, V.Sadovnichy, A.Shashkin, P.Yaskov, A.Balatskiy, L.Samokhodskaya and V.Tkachuk. Statistical methods of SNP data analysis and applications. Open Journal of Statistics, v. 2, No 1, 73-87 (2012).

A.Bulinski, A.Rakitko. Simulation and analytical approach to the identification of significant factors. Commun. in Statistics. Part B: Simulation and Computation, v. 44, 1-23 (2015).

A.Bulinski, A.Rakitko. MDR method for nonbinary response variable. J. of Multivariate Analysis, v. 135, 25-42 (2015).

A.Dehman, C.Ambroise and P.Neuvial. Performance of a blockwise approach in variable selection using linkage disequilibrium information. BMC Bioinformatics 16:148 (2015).

K-A.Do, Z.S.Qin and M.Vannucci (Eds.). Advances in Statistical Bioinformatics. Models and Integrative Inference for High-Throughput Data. Cambridge University Press, Cambridge, 2013.

D.Gola, J.M.M.John, K. van Steen and R.Konig. A roadmap to multifactor dimensionality reduction methods. Briefings in Bioinformatics, June 24, 1-16 (2015).

G. James, D. Witten, T. Hastie and R. Tibshirani. An Introduction to Statistical Learning with Applications in R, Springer Science + Business Media, New York, 2013.

I.Koch. Analysis of Multivariate and High-Dimensional Data. Cambridge University Press, Cambridge, 2014.

J-M.Marin, C.Robert. Bayessian Essentials with R. Springer Science + Business Media, New York, 2014.

J.H.Moore, S.M.Williams (Eds.). Epistasis: Methods and Protocols. Methods in Molecular Biology. v. 1253. Springer Science + Business Media, New York, 2015.

J. Park, Independent rule in classification of multi-variate Binary Data, J. of Multivariate Analysis, v. 100, No. 10, 2270-2286 (2009).

M. D. Ritchie, L. W. Hahn, N. Roodi, R. Bailey, W. D. Dupont, F. F. Parl and J.H. Moore. Multifactor dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Amer. J. Human Genetics, v. 69, 139-147 (2001).

G.Ritter. Robust Claster Analysis and Variable Selection. CRC Press, Boca Raton, 2015.

J. Shang, J. Zhang, Y. Sun, D. Liu, D. Ye and Y. Yin. Performance analysis of novel methods for detecting epistasis BMC Bioinformatics 12:475 (2011).

R. L. Taylor and T.-C. Hu. Strong laws of large numbers for arrays of row-wise independent random elements, Int. J. Math. Math. Sci., v. 10, 805-814 (1987).

D. Velez, B. White, A. Motsinger, W. Bush, M. Ritchie, S. Williams, and J. Moore. Balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genetic Epidemiology, v. 31, 306-315 (2007).

S.J.Winham, A.J.Slater and A.A.Motsinger-Reif. A comparison of internal validation techniques for multifactor dimensionality reduction. BMC Bioinformatics, 11:394 (2010).

Published
2017-03-04
How to Cite
Bulinski, A., & Kozhevin, A. (2017). New version of the MDR method for stratified samples. Statistics, Optimization & Information Computing, 5(1), 1-18. https://doi.org/10.19139/soic.v5i1.277
Section
Research Articles