Estimation of Parameters and Reliability Characteristics in Lindley Distribution Using Randomly Censored Data

This article deals with the estimation of parameters and reliability characteristics of Lindley distribution under random censoring. Expected time on test based on randomly censored data is obtained. The maximum likelihood estimators of the unknown parameters and reliability characteristics are derived. The asymptotic, bootstrap p and bootstrap t confidence intervals of the parameters are constructed. The Bayes estimators of the parameters and reliability characteristics under squared error loss function using non-informative and gamma informative priors are obtained. For computing of Bayes estimates, Lindley approximation and MCMC methods are considered. Highest posterior density (HPD) credible intervals of the parameters are obtained using MCMC method. Various estimation procedures are compared using a Monte Carlo simulation study. Finally, a real data set is analyzed for illustration purposes.


Introduction
In survival analysis and reliability theory, there are many situations when some subjects or items in the study have not experienced the event of interest at the end of the study. For example, some patients may not be disease-free at the end of the study period. The exact lifetimes (or time to event of interest) of these subjects are unknown. These are called censored observations or censored times. There are several types of censoring schemes, for example, conventional Type I and Type II censoring schemes. In these censoring schemes, items are removed from the test at the final termination of the test. But these schemes do not allow removal of units before the termination of the experiment, see, Al-Zahrani and Ali [3], Chaturvedi and Kumari [5], etc. Another type of censoring called random censoring occurs when an item under study is lost or removed randomly from the experiment before its failure. Random censoring is an important censoring in which the time of censoring is not fixed but taken as random. The Type I censoring is a particular case of random censoring in which censoring takes place at some prefixed time point, see Klein and Moeschberger [19].
The random censoring was introduced in literature by Gilbert [15]. Some early work on random censoring can be found in Efron [10], Breslow and Crowley [4] and others. Recently, random censoring has become very popular censoring scheme in literature, for example, exponential distribution under random censoring was considered by Saleem and Aslam [31]. Kim [18] obtained approximate maximum likelihood estimation for the scale parameter of 81 generalized exponential distribution under random censoring with known shape parameters. Kumar and Garg [23] studied estimation of the parameters of randomly censored generalized inverted Rayleigh distribution. Krishna et al. [21] discussed estimation in Maxwell distribution with randomly censored data. Garg et al. [13] discussed randomly censored generalized inverted exponential distribution. Kumar [22] obtained classical and Bayes estimates of the parameters of log-logistic distribution under random censoring. Kumar and Kumar [24] studied estimation in inverse Weibull distribution based on randomly censored data.
The random censoring can be described as follows: suppose n identical items are put on test with their lifetimes as X 1 , X 2 , . . . , X n which are independent and identically distributed (iid) random variables with probability density function (pdf) f X (x) and cumulative distribution function (cdf) F X (x). Also, let T 1 , T 2 , . . . , T n be the random censoring times of these items. Suppose pdf and cdf of T ′ i s are f T (t) and F T (t), respectively. Further, let X ′ i s and T ′ i s be mutually independent. Note that, between X ′ i s and T ′ i s, only one will be observed. The actual observed time be Y i = min(X i , T i ); i = 1, 2, . . . , n. Also, define the indicator variable D i as D i = (1 if X i ≤ T i otherwise 0). Note that D i is a random variable with Bernoulli probability mass function given by P [D i = j] = p j (1 − p) 1−j ; j = 0, 1 and p = P [X i ≤ T i ]. Since X ′ i s and T ′ i s are independent, so will be Y i and D i , ∀ i = 1, 2, . . . , n. Now, it is simple to show that the joint pdf of Y and D is given by where,F T (y) = 1 − F T (y) andF X (y) = 1 − F X (y). Also, the probability of failure of an item before its censoring is given by Lindley [27] introduced the Lindley distribution in connection with Fiducial distribution and Bayes theorem. In recent studies, Lindley distribution has been well established as a useful lifetime model in survival analysis and reliability theory. For example, Ghitany et al. [14] developed different distributional properties, reliability characteristics and some inferential procedures for the Lindley distribution in complete sample case. Krishna and Kumar [20] discussed reliability estimation in Lindley distribution with progressively type II right censored sample. Mazucheli and Achcar [29] described Lindley distribution applied to competing risk lifetime data. Ali et al. [2] studied the effect of the loss function on Bayes estimate, posterior risk and hazard function for Lindley distribution. Kumar et al. [25] discussed estimation of P (Y < X) in Lindley distribution using progressively first failure censoring. Dube et al. [8] described maximum likelihood and Bayes estimates of parameters and reliability characteristics of Lindley distribution under progressive first failure censoring. These studies suggest that in many real life situations Lindley distribution serves as a better lifetime model than the so far popular distributions like exponential, Rayleigh, gamma, Weibull etc.
The pdf and cdf of Lindley distribution (LD) with parameter θ are, respectively, given by and the corresponding reliability characteristics like mean time to system failure (MTSF), reliability function and failure rate function, respectively, are given by Now, let the failure time X and censoring time T follow LD(θ) and LD(λ), respectively. Then using equations (1), (3) and (4) we get the joint pdf of randomly censored variables (Y, D) as Also, using (2) the probability of failure of an item before its censoring in Lindley distribution is given by Table 1 gives the numerical values of this probability for various combinations of the true values of the parameters θ and λ.
Rest of the paper is organized as follows: In section 2, we derive maximum likelihood estimators of the parameters and reliability characteristics. Also, asymptotic confidence intervals of the parameters based on expected Fisher information are constructed. Section 3 deals with the bootstrap p and bootstrap t confidence intervals of the parameters. Section 4 describes the expected time on test based on randomly censored data from Lindley distribution. Bayes estimates of the parameters and reliability characteristics under squared error loss function using Lindley approximation and MCMC methods and HPD credible intervals based on MCMC method are discussed in section 5. In section 6, a Monte Carlo simulation study is performed to compare various estimates developed. A real data example for illustration purpose is discussed in section 7. Finally, conclusions of this article are given in section 8.

Maximum likelihood estimation
Let (ỹ,d) = (y 1 , d 1 ), (y 2 , d 2 ), . . . , (y n , d n ) be a randomly censored sample from the model in (8). Then the likelihood function is given by The normal equations are given by The equations (12) and (13) can be written as where, To solve equation (14) we propose the use of iterative procedure. Suppose θ (0) is an initial guess value of θ. Then successive approximations of θ are θ (1) . Stop the iterative procedure at the k th stage if | θ (k+1) − θ (k) |< ϵ, for some pre-specified small value ϵ > 0. Letθ be the estimated value of θ obtained by solving equation (14). Similarly, we can obtainλ after solving equation (15). Once MLEŝ θ andλ are obtained, the MLEs of M T SF , R(t) and h(t) for given mission time t 0 can be derived using invariance property of MLEs as 1 +θ +θt 0 , t 0 > 0.

Expected Fisher information
In this sub-section, we derive asymptotic confidence intervals of the parameters based on expected Fisher information matrix. According to Zheng and Gastwirth [33] the expected Fisher information in randomly censored data can be expressed in terms of hazard rate functions. The Fisher information of parameters ω = (θ, λ) contained in randomly censored sample of size n from Lindley distribution is given by where, here, h X and h T are the hazard rate function of X and T , respectively. We observe that the elements of the expected Fisher information matrix I(ω) are to be computed numerically. Thus, the variance-covariance matrix of MLEsω = (θ,λ) is the inverse of Fisher information matrix and it is given by ] .

Bootstrap confidence intervals
Here, we propose the use of two bootstrap confidence intervals for the unknown parameters. The two bootstrap methods that are widely used in practice are (i) the percentile bootstrap (boot-p) method proposed by Efron [11], and (ii) the bootstrap-t (boot-t) method proposed by Hall [16]. We use the following steps for two bootstrap confidence intervals for θ and λ as suggested by Efron and Tibshirani [9].
Similarly, the approximate 100 × (1 − ξ)% boot-p confidence intervals for λ is given by where, [a] is the intergral part of a.
Step 4. Repeat steps 2-3, B times to obtain a set of bootstrap statistics ) .

Expected time on test
In this section, we study the expected time on test (ETT) of a randomly censored experiment. In a life testing experiment it is beneficial to have an idea about the expected duration of the experiment before the starting of the test. The cost of life testing experiment directly depends on the time taken to complete the test. Let the failure time X ∼ LD(θ) and the censoring time T ∼ LD(λ). Again, let Z = max(Y 1 , Y 2 , . . . , Y n ), then the cdf of Z is given by For failure time LD(θ) and censoring time LD(λ), Therefore, The observed time on test (OBTT) is the value of Z = max(Y 1 , Y 2 , . . . , Y n ) in an observed sample. The simulated values based on 1, 000 samples from randomly censored Lindley distribution given in (8) for OBTT with their corresponding ETT are given in Table 2. This table shows that OBTT estimates ETT precisely for various values of the parameters θ and λ. The MLE of ETT can be obtained using invariance property of MLEs.

Bayesian estimation
In this section, we derive the Bayes estimates of the unknown parameters of the model in (8) using randomly censored data. There are various ways to choose the priors. Here, we consider piecewise independent gamma priors on both parameters θ & λ and are, respectively, given by Thus, the joint prior distribution of θ and λ can be written as The assumption of gamma prior is not unreasonable. Gamma prior accommodates variety of shapes depending upon hyper-parameter values. The family of gamma distributions is highly flexible in nature and can be used as a suitable prior for θ as well as λ. Many authors have used gamma prior for Lindley distribution, see Krishna and Kumar [20], Ali et al. [2], Kumar et al. [25], Dube et al. [8]. Also, it is to be noted that the non-informative prior for the parameter is a special case of gamma prior and can be obtained on putting hyper-parameters a = b = 0 in (17).
Based on the randomly censored data (ỹ,d), likelihood function in (10) and joint prior distribution of (θ, λ) in (17), the joint posterior distribution of (θ, λ) is given by From the joint posterior distribution of θ and λ given in equation (18), we observe that the posterior distributions of θ and λ are independent. Thus, the marginal posterior distribution of θ given data (ỹ,d) is given by Similarly, the marginal posterior distribution of λ given data (ỹ,d) is given by Therefore, the expectation of any function of θ say ϕ 1 (θ) and of λ say ϕ 2 (λ) respectively, are given by and For the above integrals in (21) and (22), the closed form solutions are not available. The above integrals can be solved numerically. We propose the use of Lindley's approximation method and Metropolis-Hastings algorithm to approximate the above integrals.

Lindley's approximation method
Now, we compute the Bayes estimates of the parameters θ and λ using Lindley's approximation method which is proposed by Lindley [28]. According to this method the approximate Bayes estimates of ϕ 1 (θ) under squared error loss function (SELF) is given bŷ where,

ESTIMATION OF PARAMETERS AND RELIABILITY CHARACTERISTICS IN LINDLEY DISTRIBUTION
When, π 1 (θ) = θ, the Bayes estimate of θ under SELF is given bŷ Similarly, the Bayes estimate of λ under SELF is given bŷ where, Also, the Bayes estimates of MTSF, R(t) and h(t) under SELF are given by where, where, where, All the values on the right hand side in equations (23), (24), (25), (26), (27) and (28) are to be computed at MLEs (θ,λ) of (θ, λ). Although, using Lindley's approximation methods, the Bayes estimates of the unknown parameters can be obtained easily but we cannot construct the highest posterior density (HPD) credible intervals. For this purpose we use the Metropolis-Hastings algorithm to compute Bayes estimates as well as HPD credible intervals.

Metropolis-Hastings Algorithm
Here, we consider Metropolis-Hastings (M-H) algorithm to compute Bayes estimates of the parameters and reliability characteristics. The posterior distribution of the parameter θ given in (19) is not in a well known distribution form and therefore random numbers from this distribution can be generated by using M-H algorithm. To sample from posterior distribution of θ given data (ỹ,d) in (19) we generate candidate point from a normal distribution. The following steps are used for computation purpose: Step 1. Start with initial guess θ (0) .
Here, we consider proposal density as normal distribution with mean MLEθ and variance as observed variancê V ar(θ), respectively, see, Ntzoufras (pp. 44-45) [30]. We discard first M 0 , θ (j) 's ; j = 1, 2, . . . , M 0 , where, M 0 (< M ) is the burn-in-period, to obtain an independent sample from the stationary distribution of the Markov chain which is typically the posterior distribution. Now, the approximate Bayes estimate of ϕ 1 (θ) using M-H algorithm is obtained asφ Similarly, the approximate Bayes estimate of ϕ 2 (λ) using M-H algorithm is obtained aŝ Therefore, the Bayes estimates of the parameters θ, λ and reliability characteristics like MTSF, reliability function and failure rate function with mission time t 0 under SELF using M-H algorithm are, respectively, given bŷ , t 0 > 0.

Monte Carlo simulation study
A Monte Carlo simulation study is performed to compare the effects of various estimation procedures developed in the previous sections for randomly censored samples from Lindley distribution. It is important to note that we use the statistical software R for computations purposes in this article. We choose the value of the failure parameter θ to be unity and three different values for the censoring parameter λ like 0.5, 1 and 2 so that probability of uncensored observations are p = 0.7284, 0.50 and 0.2840, respectively. We generate N = 1, 000 randomly censored samples of sizes n = 20, 30, 50 and 80 from the model in (8). Bayes estimates of the parameters are computed using independent gamma priors and non-informative priors under SELF. Notations prior0 and prior1 are used for non-informative and gamma informative priors, respectively. Hyper-parameters are chosen so that the prior's means are exactly equal to the true values of the parameters. We chose hyper-parameters (a 1 = 2, b 1 = 2, a 2 = 2, b 2 = 4), (a 1 = 2, b 1 = 2, a 2 = 2, b 2 = 2) and (a 1 = 2, b 1 = 2, a 2 = 4, b 2 = 2) according to the true values of the parameters.
In each case, we calculate the maximum likelihood and Bayes estimates of the parameters θ, λ and reliability characteristics, M T SF , reliability function R(t) and failure rate function h(t) with mission time t 0 = 0.80. Also, we compute the length of 95% classical confidence intervals and HPD credible intervals of the unknown parameters. Here, we use B = 1, 000 for bootstrap samples. Bayes estimates of the parameters and reliability characteristics are computed using Lindley approximation and M-H algorithm methods, respectively. Here, we use M = 10, 000 for M-H algorithm with burn-in-period M 0 = 2, 000. We compute the average value (AV) and the corresponding mean squared error (MSE) of different estimates, where, AV = ∑ψ (θ)/N and M SE = ∑ (ψ(θ) − ψ(θ)) 2 /N , here,ψ(θ) is the MLE or Bayes estimate of the parametric function ψ(θ). The results of the extensive simulation study are presented in Tables 2, 3 , 4, 5, 6, 7, 8 and 9. According to these Tables following conclusions are made: Maximum likelihood estimation method gives almost unbiased estimates even for small sample size 20. As the sample size n and probability of failure p increases the bias and MSE decrease as expected. Also, as the value of the censoring parameter λ increases, bias and MSE also increase. The coverage probabilities based on all confidence and HPD credible intervals of the parameters attain their prescribed confidence levels almost in all cases. Average length of all types of confidence and HPD credible intervals narrow down as sample size n and probability of failure p increase. Among the confidence and HPD credible intervals HPD credible intervals are best in respect of average length. Bayes estimates are also very good in respect of bias and MSE. Bayes estimates computed using M-H algorithm is better than estimates using Lindley's approximation method in respect of both cases bias as well as MSE. Bayes estimates are better than MLEs in respect of bias and MSE as they include prior information about the parameters. Thus, we recommend the Bayesian point and interval estimation methods for estimation purposes.

Real data analysis
In this section, we discuss a real data set for illustration purpose. The data set given below is a randomly censored data set and has been taken from Fleming and Harrington [12]. These data belong to Group IV of the primary biliary cirrhosis (PBC) liver study group conducted by Mayo Clinic. The event of interest is the time to death of  For computational convenience we analyze the data set after dividing each data point by 1,000. Now, we fit randomly censored Lindley distribution given in (8) to the real data set and compare its fitting with some well known lifetime distributions, namely, exponential, Rayleigh, gamma and Weibull distributions. The maximum likelihood estimation method is used to estimate the parameters of the above distributions. These estimates, along with the data, are used to calculate estimated negative log likelihood function (−lnL), the Akaike information criterion (AIC), proposed by Akaike [1], defined by AIC = 2 × k − 2 × ln(L), Bayesian information criterion (BIC) proposed by Schwarz [32], defined by BIC = k × ln(n) − 2 × ln(L), where k is the number of parameters in the survival model, n is the number of observations in the given data set, and L is the maximized value of the likelihood function for the estimated model and Kolmogorov-Smirnov (K-S) statistic with its p-value. The best distribution corresponds to the lowest −lnL, AIC, BIC and K-S statistic values and the highest p-value. The K-S statistic with its p-value values were obtained by using ks.test function of statistical software R. These -lnL, AIC, BIC and K-S statistics with p-values are listed in Table 10. According to -lnL, AIC and BIC, the Lindley distribution is the best choice among the competing distributions and according to K-S test gamma and Weibull distributions are slightly better than Lindley distribution. The main advantage of using Lindley distribution over gamma and Weibull distributions is that it involves only one parameter. Hence, inferential procedures become simple to deal with, especially from computational point of view.
Also, we consider a graphical method for fitting the randomly censored data. The Kaplan-Meier (K-M) product limit estimator was proposed by Kaplan and Meier [17] and for randomly censored data it is given bŷ  (1), we observe that the estimate of survival function for Lindley distribution is quite close to that proposed by K-M estimator. Thus, the randomly censored Lindley distribution is the best choice among the competing models and can be used to obtain inferential results from the considered real data set. Now, for the above data set we obtain the ML and Bayes estimates of the parameters and reliability characteristics. Since, we do not have any prior information, the Bayes estimates of the unknown parameters are computed with non-informative priors. For non-informative priors, we take hyperparameters a 1 = b 1 = a 2 = b 2 = 0 in (17). The Bayes estimates are obtained using Lindley approximation method and M-H algorithm. For M-H algorithm we take M = 10, 000 with burn-in-period 2, 000. For the reliability characteristics we take sample median value as the mission time t 0 = 0.8745. The ML and Bayes estimates of the parameters and reliability characteristics are given in Table ( 11). Also, the 95% asymptotic, boot-p, boot-t confidence intervals and HPD credible intervals for the parameters are reported in Table ( 12). Also, we check the convergence of the generated sequences of θ and λ graphically. The trace plots, autocorrelation function (ACF) plots and histograms with density plots of the 10, 000 iterations of the parameters θ and λ are presented in Figure  (2).
From Figure (2) we observe that the trace plots look like a random scatter about their mean values (represented by solid line) and fine mixing of the chains for the parameters. ACF plots show that chains have very low autocorrelations. Histograms show that the densities of θ and λ are almost symmetric. Thus, from Figure (2) we can see that M-H algorithm is convergent.  Figure 2. Trace, autocorrelation plots and histograms of the parameters θ and λ.

Concluding remarks
This article deals with estimation of the unknown parameters and reliability characteristics of Lindley distribution using randomly censored data. Expected time on test is an important practical issue in medical studies, especially in clinical trials. This work is helpful to statistician in analyzing randomly censored lifetime data. The estimation of the unknown parameters and reliability characteristics is carried out by both Bayesian and frequentist approaches.

ESTIMATION OF PARAMETERS AND RELIABILITY CHARACTERISTICS IN LINDLEY DISTRIBUTION
Maximum likelihood estimates of the parameters though not obtained analytically and found numerically using iterative methods, are quite accurate and can be easily used in practical situations. In real data analysis confidence intervals are found to be more useful for framing policies. Therefore, we derive three types of confidence intervals in this article. Bayesian estimation is very popular now days as it includes the prior information about the parameters. With squared error loss function, Bayes estimates are obtained here using Lindley's approximation method and Metropolis-Hastings algorithm. In most of the practical lifetime experiments complete data are almost never realized and random censoring is most frequent. Thus, this article has direct applications in lifetime data analysis.