Variable Selection in Count Data Regression Model based on Firefly Algorithm

Zakariya Algamal

Abstract


Variable selection is a very helpful procedure for improving computational speed and prediction accuracy by identifying the most important variables that related to the response variable. Count data regression modeling has received much attention in several science fields in which the Poisson and negative binomial regression models are the most basic models. Firefly algorithm is one of the recently efficient proposed nature-inspired algorithms that can efficiently be employed for variable selection. In this work, firefly algorithm is proposed to perform variable selection for count data regression models. Extensive simulation studies and two real data applications are conducted to evaluate the performance of the proposed method in terms of prediction accuracy and variable selection criteria. Further, its performance is compared with other methods. The results proved the efficiency of our proposed methods and it outperforms other popular methods.

Keywords


Variable selection; count data; Poisson regression; negative binomial regression; firefly algorithm

References


Z. Y. Algamal, Diagnostic in Poisson regression models, Electronic Journal of Applied Statistical Analysis, vol. 5, pp. 178–186,2012.

Y. Asar, and A. Gen?, A New Two-Parameter Estimator for the Poisson Regression Model, Iranian Journal of Science and Technology, Transactions A: Science, vol. 42, pp. 793–803, 2017.

S. Coxe, S. G. West, and L. S. Aiken, The analysis of count data: a gentle introduction to poisson regression and its alternatives, J Pers Assess, vol. 91, pp. 121–36, 2009.

Z. Wang, S. Ma, M. Zappitelli, C. Parikh, C. Y. Wang, and P. Devarajan, Penalized count data regression with application to hospital stay after pediatric cardiac surgery, Stat. Meth. Med. Res., In press., 2014.

A. C. Cameron, and P. K. Trivedi, Regression analysis of count data Cambridge university press, 2013.

J. M. Hilbe, Modeling count data Cambridge University Press, 2014.

Z. Y. Algamal, and M. H. Lee, Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification, Expert Systems with Applications, vol. 42, pp. 9326-9332, 2015.

R. Tibshirani,Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 58, pp. 267–288, 1996.

J. Fan, and R. Li,Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, vol. 96, pp. 1348–1360, 2001.

H. Zou, and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B(Statistical Methodology), vol. 67, pp. 301–320, 2005.

H. Zou,The adaptive lasso and its oracle properties, Journal of the American Statistical Association, vol. 101, pp. 1418–1429,2006.

G. I. Sayed, A. E. Hassanien, and A. T. Azar, Feature selection via a novel chaotic crow search algorithm, Neural Computing and Applications, 2017.

R. Sindhu, R. Ngadiran, Y. M. Yacob, N. A. H. Zahri, and M. Hariharan, SineCcosine algorithm for feature selection with elitism strategy and new updating mechanism, Neural Computing and Applications, vol. 28, pp. 2947–2958, 2017.

D. Broadhurst, R. Goodacre, A. Jones, J. J. Rowland, and D. B.Kell,

Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry, Analytica Chimica Acta, vol. 348, pp.71–86, 1997.

Z. Drezner, G. A. Marcoulides, and S. Salhi,Tabu search model selection in multiple regression analysis, Communications in Statistics - Simulation and Computation, vol. 28, pp. 349–367, 1999.

H. ?rkc, Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms, Applied

Mathematics and Computation, vol. 219, pp. 11018–11028, 2013.

M. J. Brusco,A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis, Computational Statistics, vol. 77, pp. 38–53, 2014.

E. Dnder, S. Gm?tekin, N. Murat, and M. A. Cengiz,Variable selection in linear regression analysis with alternative Bayesian

information criteria using differential evaluation algorithm, Communications in Statistics - Simulation and Computation, vol. 47, pp. 605–614, 2017.

J. Pacheco, S. Casado, and L. N?ez, A variable selection method based on Tabu search for logistic regression models, European Journal of Operational Research, vol. 199, pp. 506–511, 2009.

A. Unler, and A. Murat,A discrete particle swarm optimization method for feature selection in binary classification problems, European Journal of Operational Research, vol. 206, pp. 528–539, 2010.

H. Ko?, E. Dnder, S. Gm?tekin, T. Ko?, and M. A. Cengiz,Particle swarm optimization-based variable selection in Poisson regression analysis via information complexity-type criteria, Communications in Statistics - Theory and Methods, pp. 1–9, 2017.

T. J. Massaro, and H. Bozdogan, Variable subset selection via GA and information complexity in mixtures of Poisson and negative binomial regression models, arXiv preprint arXiv:1505.05229, 2015.

E. Dunder, S. Gumustekin, and M. A. Cengiz, Variable selection in gamma regression models via artificial bee colony algorithm,Journal of Applied Statistics, vol. 45, pp. 8–16, 2016.

J. M. Hilbe, Negative binomial regression Cambridge University Press, 2011.

Y. Asar, Liu-type negative binomial regression: A comparison of recent estimators and applications, In Trends and Perspectives in Linear Statistical Inference, Cham, 2018, pp. 23–39.

X.-S. Yang, Multiobjective firefly algorithm for continuous optimization, Engineering with Computers, vol. 29, pp. 175–184,2013.

S. Yu, S. Zhu, Y. Ma, and D. Mao, Enhancing firefly algorithm using generalized opposition-based learning, Computing, vol. 97,pp. 741–754, 2015.

J. Zhang, B. Gao, H. Chai, Z. Ma, and G. Yang, Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm, BMC Bioinformatics, vol. 17, pp. 323–337, 2016.

Z. Y. Algamal, and M. H. Lee, Adjusted adaptive lasso in high -dimensional Poisson regression model, Modern Applied Science,vol. 9, pp. 170–176, 2015.

J. S. Long, The origins of sex differences in science, Social forces, vol. 68, pp. 1297–1316, 1990.

Z. Y. Algamal, and M. H. Lee,A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification, Advances in Data Analysis and Classification, Accepted, 2018.

Z. Y. Algamal, Developing a ridge estimator for the gamma regression model, Journal of Chemometrics, Vol. 32, pp. 1–12, 2018.

O. S. Qasim, and Z. Y. Algamal,Feature selection using particle swarm optimization-based logistic regression model,Chemometrics and Intelligent Laboratory Systems, Vol. 182, pp. 41–46, 2018.

M. M. Alanaz, and Z. Y. Algamal,Proposed methods in estimating the ridge regression parameter in Poisson regression model,Electronic Journal of Applied Statistical Analysis, Vol. 11, pp. 506–515, 2018.

M. Kazemi, D. Shahsavani, and M. Arashi, Variable selection and structure identification for ultrahigh-dimensional partially linear additive models with application to cardiomyopathy microarray data, Statistics, Optimization & Information Computing, Vol. 6, pp.373C-382, 2018.

E. AVCI, Flexiblity of Using Com-Poisson Regression Model for Count Data, Statistics, Optimization & Information Computing,Vol. 6, pp. 278C-285, 2018.


Full Text: PDF

DOI: 10.19139/soic.v7i2.566

Refbacks

  • There are currently no refbacks.