Variable Selection in Count Data Regression Model based on Firefly Algorithm

Variable selection is a very helpful procedure for improving computational speed and prediction accuracy by identifying the most important variables that related to the response variable. Count data regression modeling has received much attention in several science fields in which the Poisson and negative binomial regression models are the most basic models. Firefly algorithm is one of the recently efficient proposed nature-inspired algorithms that can efficiently be employed for variable selection. In this work, firefly algorithm is proposed to perform variable selection for count data regression models. Extensive simulation studies and two real data applications are conducted to evaluate the performance of the proposed method in terms of prediction accuracy and variable selection criteria. Further, its performance is compared with other methods. The results proved the efficiency of our proposed methods and it outperforms other popular methods.


Introduction
In regression modeling, data in the form of counts are usually common. Count data regression modeling has received much attention in medicine, behavioral sciences, psychology, and econometrics [1,2,3,36]. The Poisson and negative binomial regression models are the most basic models under count data regression models [4] (Wang et al. 2014). The problem of overdispersion usually occurs in count data. Unlike Poisson regression model, negative binomial regression can handle the overdispersion issue [5,6,31].
In many real applications, recent developments in technologies have made the possibility to measure a large number of variables. In the regression modeling, the existence of huge number has a negative effect by overfitting the regression model. Therefore, identification of a small subset of important variables from a large number of variables set for accurate prediction is an important role for building predictive regression models [7,35].
Recently, the naturally inspired algorithms, such as genetic algorithm, particle swarm optimization algorithm, firefly algorithm, and crow search algorithm, have a great attraction and proved their efficiency as variable selection methods [12]. This is because that the main target in variable selection is to minimize the number of selected variables while maintaining the maximum accuracy of prediction, and, therefore, they can be considered as optimization problems [13].
Several researchers have employed the naturally inspired algorithms for variable selection in regression models. [14] employed the genetic algorithm for variable selection in linear and partial least squares regression models, with application in chemometrics. Drezner et al. [15] proposed to use tabu search algorithm in model selection in 521 the linear regression model. On the other hand, a hybrid algorithm of genetic algorithm and simulated annealing was proposed as a subset selection method in linear regression model by [16]. [17] did a comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis. Besides, the di?erential evolution algorithm was used as a variable selection in linear regression model by [18]. In generalized linear models, the natural inspired algorithms for variable selection are also used, such as, logistic regression model [19,20], Poisson regression model [21,22,32], and gamma regression model [23].
The purpose of this paper is to propose firefly algorithm, which is a swarm intelligence technique, as an alternative variable selection method for use in count data regression model. The proposed algorithm will efficiently help in identifying the most relevant variables in the count data regression model with a high prediction". The superiority of the proposed algorithm is proved though different simulation settings and a real data application.

Count data regression model
., x ip ) ∈ R p is a p × 1 known explanatory variable vector.Assume that y i is counts data and it has a Poisson distribution with probability density function where θ > 0 is the parameter of the Poisson distribution. In Poisson regression model (PRM), represents the conditional mean as θ i = exp(x i T β(p + 1) × 1), where β = (β 0 , β 1 , ..., β p ) is a vector of unknown regression coefficients. The PRM can be defined as The log-likelihood function of Eq. (2) is defined as The maximum likelihood estimation of the PRM is obtained by taking the first derivative of Eq. (3) and solving it as The iteratively weighted least squares algorithm can be used to obtain the maximum likelihood estimators (MLE) of the PRM asβ One of the most important assumptions in PRM is that the mean and variance of the response variable are equivalent. When this assumption is violated, then the PRM suffers from the overdispersion issue. In real applications, the conditional variance can exceed the conditional mean, and, therefore, the negative binomial regression model (NBRM) be more appropriate than a PRM for modeling count data [24,25,33,34]. In NBRM, there is a random variation, α i , in the Poisson conditional mean as α i = z i θ i , where z i is a random variable having gamma distribution such that z i ∼ Γ(λ, λ).
Assume that y i is counts data and it has a negative binomial distribution with probability density function where τ ≥ 0 is the overdispersion parameter which is defined as τ = λ −1 . The estimation of NBRM coefficients is usually estimated by the ML estimator which is obtained by maximizing the log-likelihood function Then "the ML estimator can be obtained by solving Eq. (7) as

Firefly algorithm
In recent years, "numerous nature-inspired algorithms have been proposed as powerful approaches to solve the continuous optimization problems. Minimizing the number of variables with maximizing the accuracy of prediction is an optimization problem [27]. Firefly optimization algorithm (FA) is one of the recently efficient proposed natureinspired algorithms, which is firstly introduced by [26]. The application of FA is an easy algorithm for solving the optimization problems compared with other algorithms. FA is inspired by the social behavior of fire?ies through flashing lights. FA enables a swarm of fireflies with low light intensities to move towards the neighbor brighter fireflies possessing superior search abilities in solving optimization problems. Three rules are held in FA [27]. The first rule is that all fireflies are unisex meaning that one firefly will be attracted to other fireflies regardless of their sex. The second rule is that the degree of the attractiveness of a firefly is proportion to its brightness, therefore for any two flashing fireflies, the less bright one will move towards the brighter one and the more brightness. If there is no brighter one than a particular firefly, it will move randomly. The third rule is that the brightness of a firefly is somehow related to the analytical form of the fitness function. For a maximization problem, the brightness of each firefly is proportional to the value of the cost function. Let d represents the dimension of the object function that will optimized, n f i represents the number of fireflies, δ refers the light absorption coefficient, I i is the light intensity, and r is the distance between any two firefly locations (s i j) and (s j r( This Cartesian distance can be defined as Because I i decreases when the distance from the source increases, the variations of should be monotonically decreasing function. As a result, in most applications, the I i can be approximated as where I 0 is the original light intensity. Because the attractiveness of a firefly is proportional to the I i , the attractiveness φ of a firefly is defined as where φ 0 represents the attractiveness at r = 0. FA originally is proposed to solve continuous optimization problems. However, in variable selection, the optimization problem is discrete. A binary firefly algorithm (BFA) is proposed by [28] to deal with the problem of variable selection where the position is binary. Because variable selection problem is to select a specific variable or not, thus the solution is expressed as a binary vector, where the value 1 indicates a variable to be selected and 0 otherwise.
Accordingly, the position of a firefly will be replaced as follow: where k 2 represents a random number generated from uniform distribution with [0, 1]. The pseudo code of the BFA is given in Figure 1. Consequently, our proposed algorithm setting is as follows: Step 1: The number of fireflies is n f = 40, φ 0 = 1, δ = 0.2, α = 0.1, and the maximum number of iterations is t max = 500.
Step 2: The positions of each firefly are randomly generated from uniform distribution with 0 and 1. The representation of the positions of a firefly is depicted in Figure 2.
Step 3: The fitness function is defined as Step 4: The positions of the fireflies are updated using Eq. (13).
Step 5: Steps 3 and 4 are repeated until a t max is reached".

Computational results
In this section, "the performance of our proposed variable selection method, FA-BN is tested. Further, the performance of FA-BN is compared with the Akaike information criteria (AIC), corrected Akaike information criteria (CAIC), and Bayesian information criteria (BIC) that are defined as, respectively, where ℓ(β) is the log-likelihood for either PRM or NBRM, and q is the number of selected variables".

simulation results
In this section, "the same simulation settings of [29] and [4] are used. Each simulation setting is considered for PRM and NBRM. The sample size is considered with n ∈ {50, 100, 200}. For all the simulation examples 1-3, the response variable is generated according to PRM as y i ∼ P o(exp(x i T β)). Simulations 4-6 are the same as the setting of simulations 1 C 3 where the response variable is generated according to NBRM with conditional mean exp(x i T β) and τ = 2. For performance evaluation of the FA-BN, the mean squared error (MSE) is used as a prediction accuracy criteria, which is defined as In terms of variable selection performance, the number of the truly nonzero coefficients which are incorrectly set to zero (I), and the number of the true zero coefficients which are correctly set to zero (C). The higher the values of C, and the lower the values of I, the better the variable selection performance is. All computations of this paper were conducted using R. Based on 500 times of repeating simulation, the averaged MSE, I, and C with their associated standard deviations (the number in parentheses) are listed in Tables 1-6, respectively, for PRM and NBRM.
It shows from these tables that the FA-BN method there has a significant improvement where it has a much better average of MSE than those AIC, CAIC, and BIC methods. For instance, in Table 1 when n = 50, the MSE Z. ALGAMAL 525 reduction by FA-BN was about 56.91%, 49.07%, and 44.43% comparing with AIC, CAIC, and BIC, respectively. Further, regardless of the value of n, the FA-BN often shows the smallest MSE among the competitor methods.
In terms of variable selection performance, our proposed method obviously selects a very few irrelevant variables comparing with AIC, CAIC, and BIC, where the number of the true zero coefficients which are correctly set to zero is high comparing with others. For example, in Table 4 Table 4 when n = 50, FA-BN does not select on average one important variable out of 10 important variables. In the same case, AIC, CAIC, and BIC select on average more than 6 important variables.
From the results of simulation 3 (Table 3) and simulation 6 ( Table 6), the model is dense, and, therefore, all the methods have zero values for the criterion C. On the other hand, FA-BN is the best because the number of nonzero variables that have been identified as irrelevant variables is smaller compared with AIC, CAIC, and BIC. It is worth noting that AIC has inferior performance in all simulation examples comparing with CAIC, BIC, and FA-BN methods.
In summary, it is obvious that the simulation results for both PRM and NBRM demonstrated the use of FA-BN in variable selection. Another important point that is concluded from the simulation results is that the variable selection performance of the FA-BN is not changed by changing the sample size".

real application results
In this section, "two real applications are considered. The first real application related to the PRM, while the second real application related to the NBRM. For the first real application, the number of publications produced by Ph.D. biochemists of [30] is considered where the response variable is the number of articles in last three years of Ph.D. Five explanatory variables were used. They are: the gender (x 1 ), the marital status (x 2 ), the number of children under age six (x 3 ), prestige of Ph.D. program (x 4 ), and the number of articles by the mentor in last three years (x 5 ). In this application, the response variable is following Poisson distribution. Depending on the PRM analysis, four explanatory variables, x 1 , x 2 , x 3 , and x 5 , are significantly related to the response variables with a level of significant 0.05.
In the second real application, we considered the nuts dataset [24]. The dataset contains 52 observation and 7 explanatory variables. The nuts dataset concerning with the squirrel behavior and several features of the forest across different plots in Scotland's Abernathy Forest. The response variables, which is the number of cones stripped  . Depending on the NBRM analysis, five explanatory variables, x 1 , x 2 , x 3 , x 5 , and x 6 , are significantly related to the response variables with a level of significant 0.05. Tables 7 and 8 summarize the MSE and the selected variables for each used method for both real data applications, respectively. As seen from the result of Tables 7 and 8, FA-BN can remarkably reduce the MSE comparing with AIC, CAIC, and BIC. In terms of selected variables, on the other hand, it clearly seen from Table 7 that FA-BN only select 4 variables out of 5 variables when PRM is assumed. FA-BN selected the explanatory variables x 1 , x 2 , x 3 , and x 5 . These selected variables are identified as relevant variables to the study. Comparing with AIC and BIC, FA-BN includes extra variable but the MSE is less than them. Further, AIC, CAIC, and BIC selected one irrelevant variables (x 4 ), indicating the possibility of these methods to select unimportant variables. Regarding Table 8, FA-BN has similar results in terms of selected variables. FA-BN only select 4 variables out of 7 variables when NBRM is assumed. FA-BN selected the explanatory variables x 1 , , x 5 , and x 6 . These selected variables are identified as relevant variables to the study. Each of AIC, CAIC, and BIC, on the other hand, shows the ability in selecting irrelevant variables".

Conclusion
In this paper, the problem of selecting variables in count data regression models is considered. A firefly algorithm was proposed as a variable selection method.  applications demonstrated the superiority of the FA-BN in terms of MSE, I, and C comparing with AIC, CAIC, and BIC methods.