A New Distribution for Modeling Lifetime Data with Different Methods of Estimation and Censored Regression Modeling

In this paper and after introducing a new model along with its properties, we estimate the unknown parameter of the new model using the maximum likelihood method, Cramér-Von-Mises method, bootstrapping method, least square method and weighted least square method. We assess the performance of all estimation method employing simulations. All methods perform well but bootstrapping method is the best in modeling relief times whereas the maximum likelihood method is the best in modeling survival times. Censored data modeling with covariates is addressed along with the index plot of the modified deviance residuals and its Q-Q plot.


Introduction
Let the random variable (rv) X follows a Exponentiated Exponential (EE) distribution (see Gupta et al. [5]) with probability density function (pdf) and cumulative distribution function (cdf), given below π (a,b) and Π (a,b) respectively. In this work we introduce the the first extension of (1) and (2) using the Burr XII-G (BrXII-G) family, introduced by Cordeiro et al. [4]. The cdf of the BrXII-G family of distributions is defined by The pdf corresponding to (3) is given by

611
where π (ψ) (x) is the baseline density. The hazard rate function (hrf) of X is τ (α,β,ψ) Modeling different characteristics of lifetime data sets are important issue and yet researchers have introduced sophisticated lifetime models to remove the drawbacks of existing models such as new Odd Log-Logistic Half-Logistic by Alizadeh et al. [1], Zografos-Balakrishnan odd log-logistic generalized half-normal distribution by Mozafari et al. [13] and Altun et al. [2], Topp-Leone generated Burr XII by Yousof et al. [16] and odd log-logistic Marshal-Olkin generalized half-normal by Korkmaz et al. [10] and among others. Now, we introduce a new lifetime model to create a new opportunity in modeling the different characteristics of lifetime data sets. By inserting (2) into (3) we obtain the cdf of the BrXIIEE as the pdf corresponding to (3) is given by The hazard rate function hrf of (6) reduces to τ (α,β,a,b) For more details about the BrXII-G see Cordeiro et al. [4]. For more detail about properties of EE model see Gupta and Kundu [6], Gupta and Kundu [7] and Nadarajah [14]. Following Cordeiro et al. [4], we provide a transformation to generate the random variables from the BrXIIEE distribution. Let U be a U (0, 1) random variable and W = has cdf (5). Again following Cordeiro et al. [4], the physical interpretation of the BrXIIEE distribution is given. Let Y be rv following the EE density. The odds ratio that an individual having the rv Y will die (failure) at time x is Note that the function ζ (X) is monotonic and non-decreasing. Then, if the researcher wants to calculate the randomness of the odds by the rv T following the BrXIIEE density , given in (5), we can write which is identical to (2). Consequently, if X has the BrXIIEE model, then T = ζ (X) has the BrXIIEE cdf given by (5). The cdf (5) of X can be expressed as and Applying (8) for A in (7) we obtain Second, using the binomial expansion, the last equation can be expressed as Third, applying (9) for B in the last equation gives where By differentiating (10), we obtain (11) is the main result of this section. It reveals that the BrXIIEE density is a linear combination of EE density. So, some of its mathematical properties can be easily determined from those of the EE density. Figure 1 displays the some possible shapes of BrXIEE distribution. As seen from Figure 1, the proposed distribution is good choice for modeling the right skewed and symmetric data sets. Moreover, BrXIIEE distribution provides very flexible treatment for lifetime modeling since its hrf contains constant, increasing, decreasing, upsidedown and bathtub shapes. The rest of the paper is outlined as follows. Some properties of the new model are derived in Section 2. Five methods of estimation are described in Section 3. Simulation studies are carried out to compare the performance of the five methods for the proposed model in Section 4. A new regression model for censored data is presented in Section 5. Empirical results for univariate data modeling and censored data modeling with covariates are addressed in Section 6. Section 7 offers some concluding remarks.

Moments and generating function
The r (th) ordinary moment of X say µ ′ r = E(X r ), is determined from (11) as The r (th) incomplete moment of X, say φ r (t), can be determined from (11) as where The moment generating function M (t) = E(e t X ) of X follows from (11) as

Probability weighted moments
The probability weighted moments (PWMs) are used to estimate unknown model parameters. This approach produces better results than the standard method of moments estimation. The (s, r) (th) PWM of X denoted by ρ s,r is formally defined by Using (5), we have Using the Taylor series for z λ , we have Firstly, we apply the Taylor series of z λ in Secondly, using (6) and the above equation, we have Applying (8) for C in the last equation, we obtain Thirdly, using the binomial expansion for D, the above equation can be rewritten as Applying (9) for E in the last equation gives and then So, the (s, r) (th) PWM of X is obtained as ] .

616
A NEW DISTRIBUTION FOR MODELING LIFETIME DATA WITH DIFFERENT METHODS OF ESTIMATION

Residual life and reversed residual life functions
The n (th) moment of the residual life of X is given by

The mean residual life (MRL) function is given by
which represents the expected residual life for an individual which is live at age t. The MRL of X can be obtained by putting n = 1 in the equation of the moment of residual life. The n (th) moment of the reversed residual life, say Then, the n (th) moment of the reversed residual life of X is The mean inactivity time (MIT), known as the mean reversed residual life function can be obtained by putting n = 1 in the equation of the moment of the reserved residual life.

Maximum likelihood method
In this work, we estimate the unknown parameters (α, β, a, b) of the BrXIIEE model from the complete samples by maximum likelihood (ML) method. Suppose that x 1 , · · · , x n be a random sample from the BrXIIEE model with parameter vector Φ =(α, βa, b) . The log-likelihood function (ℓ n (Φ)) for Φ is given by The above ℓ n (Φ) can be maximized numerically via SAS (PROC NLMIXED) or R (optim) or Ox program (via sub-routine MaxBFGS), among others. The components of the score vector

Method of Least Square and Weighted Least Square Estimation(WLSE)
The theory of LSE and WLSE was firstly proposed by Swain et al. [15] to estimate the parameters of Beta distribution. It is based on the minimization of the sum of the square of differences of theoretical cumulative distribution function and empirical distribution function. Suppose F (X i:n , α, β, a, b) denotes the cdf of BrXIIEE model and if x 1 < x 2 < · · · < x n be the n ordered random sample. The LSEs are obtained upon minimizing Using (5), we have The LSEs are obtained via solving the following non-linear equations are the values of 1 st derivatives with recpect to (w.r.t.) parameter of the cdf of the BrXIIEE distribution. The LSEs of the parameters α, β, a and b are obtained by solving the above simultenious equations by using any numerical approximation technique. The WLSE are obtained by minimizing the given form of equation with respect to the parameters.

Method of Cramér-Von-Mises estimation (CVME)
The CVME of the parameter is based on the theory of minimum distance estimation (MDE). It was firstly proposed by MacDonald [11] and justified that the bias of the estimator is smaller than the other MD estimators. So, the CVME of the parameter α, β, a and b are obtained by minimizing the following expression w.r.t. the parameter α, β, a and b respectively, then we have The, Cramér-Von-Mises estimators (CVME) of the parameters are obtained by solving the following non-linear equations α, β, a, b), α, β, a, b).

Bootstraping method
Bootstrapping is a widely used estimation method as a bias reduction. This method is preferred when the sample size is small, approximately less than 40. The detail information on bootstrap method can be found in Hesterberg [9].

Simulation studies
Here, a simulation study is given to compare the finite sample performance of estimation methods, presented in Section 3. The simulation results are interpreted based on the average of estimates and mean square errors (MSEs). We expect to see that when the sample size is sufficiently large, the MSEs is near the zero and average of estimates is near the true parameter value. We generate 1000  Tables 1-5.  From Tables 1-5, we observe that all the estimates for all selected methods show the property of consistency i.e., the MSEs decrease as n increase. Upon comparing the different, the results show that the MLE produces the best results for estimating the parameters α, β, a and b, in terms of MSEs in thee most of cases. The ordering of performance of estimators in term of MSEs (from the best to the worst) for α is MSE, Bootstrap, WLSE, and LSE.  The ordering of performance for β is MLE, CVME, WLSE, LSE and Bootstrap, for a is MSE, WLSE, Bootstrap, CVME and LSE and MLE, WLSE, CVME, LSE and Bootstrap for the parameter b.

The log-BrXIIEE regression model for censored data
Let the random variable X follow a BrXIIEE distribution. The density function of Y = log(X) with location and scale parameters is where µ ∈ ℜ and σ > 0 are the location and scale parameters respectively. Hereafter, (14) is referred as the log-BrXIIEE (LBrXIIEE) distribution and denoted as Y ∼ LBrXIIEE(α, β, a, b, σ, µ). The possible shapes of the pdf of LBrXIIE are plotted in Figure 2. As seen from Figure 2, the LBrXIIEE distribution can be used to model symmetric, left and right skewed data sets. α=2.5,β=0.5,a=0.5,b=0.5 α=2.5,β=0.5,a=0.6,b=0.6 α=2.5,β=0.5,a=0.7,b=0.7 α=2.5,β=0.5,a=0.8,b=0.8 α=2.5,β=0.5,a=0.9,b=0.9 The corresponding survival function to (14) is Using the LBrXIIEE density given in (14), a location-scale regression model, linking the explanatory variable vector, v ⊤ i = (v i1 , ..., v ip ), to mean of the response variable, y i , is where the random variable y i follow a LBrXIIEE distribution. Let F and C be the sets of individuals for which y i is the log-lifetime or log-censoring, respectively. Assume that the observed lifetimes and censoring times are independent. The log-likelihood function of LBrXIIEE regression model is given by where τ τ τ = (α, β, a, b, σ, β ⊤ ) ⊤ is the parameter vector, u i = exp(z i ), z i = (y i − v ⊤ i β)/σ and r is the number of uncensored observations (failures), n − r is the number of censored observations and n is the total number of observations. Note that when the parameter a = 0, LBrXIIEE regression model reduces to LBrXIIE regression model. The MLE of τ τ τ , say τ τ τ , can be obtained by maximizing the log-likelihood (17). Under regularity conditions, the asymptotic distribution of ( τ τ τ − τ τ τ ) is multivariate normal N p+5 (0, K(τ τ τ ) −1 ), where K(τ τ τ ) is the expected information matrix. The asymptotic covariance matrix K(τ τ τ ) −1 of τ τ τ can be replaced by the inverse of the (p + 5) × (p + 5) observed information matrix −Ł(τ τ τ ).

Residual analysis
Residual analysis is an essential part of any regression model. It is important to check the assumption on error distribution. Here, two types of residuals are considered: martingale and modified deviance residuals. The martingale residual for LBrXIIEE model is where u i = exp(z i ) and z i = (y i − v ⊤ i β)/σ. The modified deviance residual for LBrXIIEE model is wherer Mi is the martingale residual.

Univariate data modeling
In this section, we compare the BrXIIEE with some competitive models to demonstrate its usefulness in data modeling. The MLE method is used to estimate the parameters of the fitted models. The below model selection criteria and goodness-of-fit tests are used to select best model.

Cramér-Von Mises
where z i = F (y j ) and the y j 's values are the ordered observations. Akaike Information Criterion (AIC):

Hannan-Quinn Information Criterion (HQIC):
Bayesian Information Criterion (BIC): where p is the number of parameters, n is the sample size and ℓ is the maximized log-likelihood, moreover, we consider the Kolmogorov-Smirnov (KS) statistic ( (with its p-value). Two real data sets are analyzed with BrXIIEE and competitive models. The first data set represents the lifetime data relating to relief times (in minutes) of patients receiving an analgesic (see Gross and Clark [8]). The second data set represents the survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli, reported by Bjerkedal [3]. We shall compare the fits of the BrXEE distribution with those of other competitive models, namely: Exponential (E), Odd Lindley Exponential (OLiE), Marshall-Olkin Exponential (MOE), Moment Exponential (MomE), The Logarithmic Burr-Hatke Exponential (Log BrHE), Generalized Marshall-Olkin Exponential (GMOE), Beta Exponential (BE), Marshall-Olkin Kumaraswamy Exponential (MOKwE), Kumaraswamy Exponential (KwE), Burr X-E (BrXE) and Kumaraswamy Marshall-Olkin Exponential (KwMOE).    Based on values obtained in Tables 6-9 along with Figures 3 and 4, the proposed lifetime model is much better than the above mentioned extensions of the exponential model so the new lifetime model is a good alternative to these models in modeling relief times and survival times data sets. Then, the estimation methods presented in Section 3 are compared via these two data sets and results are reported in Tables 10 and 11. From these results, we conclude that the bootstrap and MLE methods produce better parameter estimates than other methods.
From Tables 10 and 11 and based on the W * and A * statistics, we recommend to use the Bootstrapping method to estimate the parameters of the BrXIIEE distribution for data set I though all methods of estimation performed well and the ML method for data set II.

Censored data modeling with covariates
The data set, reported in McGilchrist and Aisbett [12], consist of times to the first and second recurrences of infection in 38 kidney patients using a portable dialysis machine. The data set can be also found in R package survival. The LBrXIIEE regression model is used to analyze this dataset. The variables involved in the study are: y i -time to infection since insertion of the catheter; cens i -censoring indicator (1=uncensored, 0=censored), x i1 (1 = male, 0 = female) is the sex of each patient. The following regression structure is fitted by LBrXIIEE and LBrXIIE models.

Maximum Likelihood Estimation
The MLE method is to obtain unknown parameters of the of LBrXIIEE and LBrXIIE regression models and the results are listed in Table 12. Based on the reported AIC and BIC statistics, the LBrXIIEE provides better fit than the LBrXIIE regression model since it has lower values of these statistics than those of the LBrXIIE regression model. Also, regarding the estimated regression parameters, we conclude that the parameters β 0 and β 1 are statistically significant at 5% significance level. Likelihood Ratio (LR) test is used to compare the LBrXIIEE and LBrXIIE regression models. Table 13 shows the LR statistic and the corresponding p-value for the used data set. Based on the figures in Table 13, the computed p-value is smaller than 0.05, so the null hypotheses is rejected in favour of LBrXIIEE regression model. Therefore, we conclude that the LBrXIIEE regression model provides better fits than its sub-model according to the LR test results.

Residual Analysis
The modified deviance residuals of the LBrXIIEE regression model are calculated and plotted in Figure 5 with its quantile-quantile (QQ) plot. These figures reveal that the LBrXIIEE regression model provides adequate fit to the used data set and there is no observation can be evaluated as a potential outlier.

Concluding remarks
This paper introduces a new extension of the exponentiated-exponential distribution by using the BrXI-G family.
The new model is shortly denoted as BrXIIEE. The properties of the BrXIIEE distribution are obtained and discussed in great detail. Five estimation methods are considered to estimate the parameters of the BrXIIEE distribution, and the finite sample performance of these estimation methods are compared with simulation studies under different scenarios. Three data sets are analyzed to convince the readers and researchers in favour of the BrXIIEE against the competitive models.