The Ristic-Balakrishnan odd log-logistic family of distributions: Properties and Applications

We introduce and study general mathematical properties of a new generator of continuous distributions with two extra parameters called the Ristic-Balakrishnan odd log-logistic family of distributions. We present some special models. We investigate the asymptotes. The new density function can be expressed as a linear combination of exponentiated densities based on the same baseline distribution. Explicit expressions for the ordinary and incomplete moments, generating functions and order statistics, which hold for any baseline model, are determined. Further. We discuss the estimation of the model parameters by maximum likelihood and we studied a simulation study based on maximum likelihood estimation. A regression model based on the proposed model was introduced. We illustrate the potentiality of the family utilizing three applications to real data.


Introduction
The statistics literature is filled with hundreds of continuous univariate distributions: see Johnson et al. [13], [14]. Adding parameters to a well-established distribution is a time honored device for obtaining more flexible new families of distributions. Recent developments have been made to define new generated families of distributions to control skewness and kurtosis through the tail weights and provide great flexibility in modeling skewed data in practice, including the two-piece approach introduced by Hansen [12] and the generators pioneered by Eugene et al. [6], Cordeiro and de Castro [5] and Alexander et al. [2]. Many subsequent articles apply these techniques to induce skewness into well-known symmetric distributions such as the symmetric Student t; see, Aas and Haff [1], for a review.
We study several general mathematical properties of the gamma generated ("Gamma-G" for short) family of distributions. This family was motivated by a pioneered work by Ristic and Balakrishnan [23]. It is also important to mention that the results presented in this paper follow similar lines of the results developed by Nadarajah et al. [19], although their model is completely different from that one discussed in this paper.
The proposed family can extend several common distributions such as normal, Weibull and Gumbel distributions by adding two extra generator parameters. Indeed, for any baseline G distribution, we can define the associated RBOLL-G ("RBOLL-G") distribution. The main characteristics of the GE family, such as moments and generating and quantile functions, have tractable mathematical properties. The role of the generator parameters has been 18 THE RISTIC-BALAKRISHNAN ODD LOG-LOGISTIC FAMILY OF DISTRIBUTIONS investigated and is related to the skewness and kurtosis of the generated distribution. The family studied here can be considered a special subfamily of that one proposed recently by Ristic and Balakrishnan [23] ("RB-G" from now on). The Gamma-G family can be constructed as follows. Let G(x) be any continuous distribution defined on a finite or an infinite interval. The RB family of distributions is defined from the cumulative distribution function (cdf) (for α > 0) where Γ(α) = ∫ ∞ 0 t a−1 e −t dt denotes the gamma function, G(x; τ ) (x ∈ R), denote the baseline cdf and τ denotes the parameters in the parent G(.).
Let g(x; τ ) = dG(x; τ )/dx be the corresponding baseline probability density function (pdf). The density function corresponding to (3) becomes The hazard rate function (hrf) of X is given by The RBOLL-G distribution has the same parameters of the G distribution plus two additional parameters α and β. From now on, a random variable X with density function (4) is denoted by X ∼RBOLL-G(α, β, τ ). For α = β = 1 the RBOLL-G distribution reduces to the baseline G distribution, for α = 1 we obtain RB-G distributions and for β = 1 we obtain a Odd log logistic (OLL-G) distributions.
Each new RBOLL-G distribution can be obtained from a specified G distribution. From the statistical modeling point of view, the RBOLL-G distribution has two important aspects. First, the proposed model has more parameters than the baseline distribution and the additional parameters α and β of the generated model have clear interpretations. In fact, the RBOLL-G family of distributions is easily simulated by inverting (3) as follows: if V has U (0, 1) distribution, Q G (.) denote the quantile function of baseline G and Q Γ(β,1) (.) denote the quantile function of Γ(β, 1) random variable, then solution of the nonlinear equation Some plots of PDF and HRF related to 7 are given in figure 1. As shown in figure 1, the pdf of RBOLL-W, can be unimodal, bimodal or almost symmetric. The hazard rate function of RBOLL-W can be constant, decrasing, incraesing or bathtub shape. It is important in reliability theory.

Main Properties
In this section, we study some general properties of RBOLL-G family.

Expansions for pdf and cdf
Some useful expansions for (4) and (3) can be derived using the concept of exponentiated distributions. For an arbitrary baseline cdf G(x), a random variable is said to have the exponentiated-G distribution with parameter a > 0, say X ∼ exp-G(a), if its pdf and cdf are h a (x) = aG(x)g(x) a−1 (9) and respectively. The properties of exponentiated distributions have been studied by many authors in recent years, see Mudholkar and Srivastava [17] for exponentiated Weibull, Gupta et al. [9] for exponentiated Pareto, Gupta and Kundu [10] for exponentiated exponential, Nadarajah [20] for exponentiated Gumbel, Kakde and Shirke [15] for exponentiated lognormal, and Nadarajah and Gupta [21] for exponentiated gamma distributions. Also, we can refer to recent articles by Mozafari et al. [16] and Alizadeh et al. [3]. We note that for a > 1 and a < 1 and for larger values of x, the multiplicative factor aG(x) a−1 is greater and smaller than one, respectively. The reverse assertion is also true for smaller values of x. The latter immediately implies that the ordinary moments associated with the density h a (x) are strictly larger (smaller) than those associated with the density g(x) when a > 1 (a < 1). The binomial coefficient generalized to real arguments is given by ( x y ) = Γ(x + 1)/[Γ(y + 1)Γ(x − y + 1)]. First using taylor expansion we have For any real parameter a > 0, the following formula holds (http:// functions.wolfram.com/ ElementaryFunctions/ Log/ 06/ 01/ 04/ 03/) where the constants p j,k can be calculated recursively by ,a 0 = λ0 ρ0 and h r (α, β + i + k) is defined in Appendix. and then (3) can be expressed as for r ≥ 1 and H r (x) denotes the cdf of the exp-G(r) distribution. The corresponding (4) can be expressed as where h r+1 (x) denotes the pdf of the exp-G(r + 1) distribution. So, several properties of the gamma-G distribution can be obtained by knowing those of the exp-G distribution, see, for example, Mudholkar et al. [18], Gupta and Kundu [11] and Nadarajah and Kotz [22], among others.

Moments
Hereafter, we shall assume that G(x) is the baseline cdf of a random variable Y and that F (x) is the cdf of the random variable X having density function (4). The moments of the RBOLL-G distribution can be obtained from the (r, k)th probability weighted moments (PWMs) of Y defined by In fact, we have where b r is defined by equation (16). Thus, the moments of any RBOLL-G distribution can be expressed as an infinite linear combination of the baseline PWMs. A second formula for τ s,k can be written in terms of The PWMs for various distributions can be determined by using equations (18) and (20). The following special cases were already published by Cordeiro and Nadarajah [5].

Incomplete moments
Here, we propose two methods to determine the incomplete moments of the new family. First, the nth incomplete moment of X can be expressed as The integral in (21) can be computed at least numerically for most baseline distributions.

Generating function
In this section, we provide two formulae for the moment generating function (mgf) M (s) = E(e sX ) of a random variable X with the RBOLL-G distribution. A first formula for M (s) comes from equation (17) as where M r+1 (t) is the generating function of the exp-G distribution with power power parameter t + 1. Hence, M (s) can be determined from the exp-G generating function. A second formula for M (s) can be derived from equation (22) as where the quantity ρ r (s) = ∫ ∞ −∞ exp(tx) G(x) r g(x)dx follows from the baseline qf as 24 THE RISTIC-BALAKRISHNAN ODD LOG-LOGISTIC FAMILY OF DISTRIBUTIONS

Mean deviations
The mean deviations about the mean (δ 1 (Y ) = E(|Y − µ ′ 1 |)) and about the median (δ 2 (Y ) = E(|Y − M |)) of Y can be expressed as Now, we provide two alternative ways to compute δ 1 (Y ) and δ 2 (Y ). A general equation for m 1 (z) can be derived from equation (17) as where Equation (27) is the basic quantity to compute the mean deviations for the RBOLL-G distributions. The mean deviations defined in (25) depend only on the first incomplete moment of the Exp-G distributions. So, alternative representations for δ 1 (Y ) and δ 2 (Y ) are A second general formula for m 1 (z) can be derived by setting u = G(x) in (26) where T r (z) is given by

Order statistics
Suppose X 1 , . . . , X n is a random sample from the RBOLL-G family. Denote the random variables in the ascending order as X 1:n ≤ . . . ≤ X n:n . The pdf of X i:n is given by denotes the exp-G density function with parameter r + k + 1 and

25
where b k is defined by (16). Here, the quantities f j+i−1,k are obtained recursively by Thus, one can easily obtain ordinary and incomplete moments and generating function of order statistics for any given G.

Estimation
Here, we determine the maximum likelihood estimates (MLEs) of the model parameters of the new family from complete samples only. Let x 1 , . . . , x n be observed values from the RBOLL-G distribution with parameters α, β and τ . Let Θ = (α, β, τ ) ⊤ be the r × 1 parameter vector. The total log-likelihood function for Θ is given by The log-likelihood function can be maximized either directly by using the SAS (PROC NLMIXED) or the Ox program (sub-routine MaxBFGS) or by solving the nonlinear likelihood equations obtained by differentiating (30). The components of the score function U n (Θ) = (∂ℓ n /∂α, ∂ℓ n /∂β, ∂ℓ n /∂τ ) ⊤ are where g (τ ) (·) means the derivative of the function g with respect to τ . where

26
THE RISTIC-BALAKRISHNAN ODD LOG-LOGISTIC FAMILY OF DISTRIBUTIONS the functions g(·) and G(·) are defined in Section 1 and ψ(.) is the digamma function. The MLE θ of θ is obtained by solving the nonlinear likelihood equations Uα(θ) = 0, U β (θ) = 0 and Uτ (θ) = 0. These equations cannot be solved analytically and statistical software can be used to solve them numerically. We can use iterative techniques such as a Newton-Raphson type algorithm to obtain the estimate θ. We employ the numerical procedure NLMixed in SAS.
For interval estimation of (α, β, τ ) and hypothesis tests on these parameters we obtain the observed information matrix since the expected information matrix is very complicated and requires numerical integration. The (p + 2) × (p + 2) observed information matrix J(θ), where p is the dimension of the vector τ , becomes , whose elements can compute easily. Under conditions that are fulfilled for parameters in the interior of the parameter space but not on the boundary, the asymptotic distribution of replaced by J( θ), i.e., the observed information matrix evaluated at θ, can be used to construct approximate confidence intervals for the individual parameters.
We can compute the maximum values of the unrestricted and restricted log-likelihoods to obtain likelihood ratio (LR) statistics for testing some sub-models of the Ga-G distribution. Tests of the hypotheses of the type H 0 : ψ = ψ 0 versus H : ψ ̸ = ψ 0 , where ψ is a subset of parameters of θ, can be performed through LR statistics. For example, we may use the LR statistic to check if the fit using the Ga-G distribution is statistically "superior" to a fit using the G distribution for a given data set.

Simulation
We assess the performance of the MLEs of the RBOLL-W parameters. The precision of the MLEs is based on the following measures: bias, mean square error (MSE), estimated average length (AL) and coverage probability (CP). We generate N = 1, 000 samples of sizes n = 50, 55, . . . , 1000 from the RBOLL-W distribution with α = 2, β = 3, λ = 5, γ = 7 by using the inverse transform method. The MLEs of the model parameters are obtained for each generated sample, say (α i ,β i ,λ i ,γ i ), for i = 1, . . . , N . The standard errors of the MLEs are evaluated by inverting the observed information matrix, namely for ϵ = α, β, λ, γ. The CPs and ALs are, respectively, given by and The values for the above measures are displayed in the plots of Figure 3. We can note that the estimated biases decrease when the sample size n increases. Further, the estimated MSEs decay toward zero when n increases. This fact reveals the consistency property of the MLEs. The CPs are near to 0.95 and approach to the nominal value when the sample size increases. Moreover, if the sample size increases, the ALs decrease. The reported results are obtained for a selected parameter vector (α, β, λ, γ). However, similar results hold for other parameter values.

Log-RBOLL-W regression model
Let X be a random variable having the RBOLL-W density function with four parameters α > 0, β > 0, λ > 0 and γ > 0, given in (7). The density function of Y = log(X), replacing λ = 1/σ and γ = exp (µ), is given by (for y ∈ ℜ) where µ ∈ ℜ is the location parameter, σ > 0 is the scale parameter and α > 0 and β > 0 are the shape parameter. We refer to Equation (31) as the LRBOLLW distribution, say Y ∼ LRBOLLW(α, β, σ, µ). The corresponding survival function is The standardized random variable Z = (Y − µ)/σ has density function Parametric regression models to estimate univariate survival functions for censored data are widely used. A parametric model that provides a good fit to lifetime data tends to yield more precise estimates of the quantities of interest. Based on the LRBOLLW density, we propose a linear location-scale regression model linking the response variable y i and the explanatory variable vector where the random error z i has density function (33), β β β = (β 1 , . . . , βp) ⊤ , σ > 0, α > 0 and β > 0 are unknown parameters.
The  v 1 ), . . . , (yn, vn) of n independent observations, where each random response is defined by y i = min{log(x i ), log(c i )}. We assume non-informative censoring such that the observed lifetimes and censoring times are independent. Let F and C be the sets of individuals for which y i is the log-lifetime or log-censoring, respectively. The log-likelihood function for the vector of parameters τ τ τ = (α, σ, β ⊤ ) ⊤ from model (34) has the form l(τ τ τ ) = ∑ i∈F is the density (31) and S(y i ) is the survival function (32) of Y i . Then, the total log-likelihood function for τ τ τ is given by i β)/σ and r is the number of uncensored observations (failures) and c is the number of censored observations. The MLE τ τ τ of the vector of unknown parameters can be evaluated by maximizing the log-likelihood (35).

Applications
In this section, we compare the fits of the RBOLL-G distributions with other competitive distributions by means of three real data sets to illustrate the potentiality of the new family. The goodness-of-fit statistics is used to compare the fitted models and verify which model gives the best fit to the data. The Cramer von Mises (W ⋆ ) and Anderson Darling (A ⋆ ) statistics, log-likelihood values, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are obtained for all fitted models to choose the best model. The smaller the values of these statistics, the better the fit to the data.

First application
The first data set represents the failure times for a particular wind-shield model including 85 observations that are classified as failed times of wind-shields. The used dat set is given in Appendix B. Table 1 shows the estimated parameters and their standard errors, −ℓ, AIC and BIC values. Based on the figures in Table 1, RBOLL-W distribution provides the best fit among others. Figure 2(a) displays the histogram with fitted pdfs and Figure 2(b) displays the fitted pdf, hrf, survival function and P-P plot of RBOLL-W distribution. These figures reveal that RBOLL-W model provides superior fits to second data set. Table 2 shows the LR statistics and the corresponding p-values for the first data set. From Table 2, the computed p-values are smaller than 0.05, so the null hypotheses are rejected for all sub-models. We conclude that the RBOLL-W model fits the first data better than the its sub-models according to the LR test results.

Second Application
The second data set contains 32 observations corresponding to the birth weights of newborn babies in ounces. The used data set is given in Appendix B.
The MLEs (and their standard errors) of the parameters and the numerical values of the statistics and LR test results are presented in Tables 3 and 4, respectively. The figures in Table 3 reveal that the RBOLL-N model gives the lowest values for all goodness-of-fit statistics as compared to the other models. Hence, the RBOLL-N model can also be chosen as the best model for the second data set. Table 4 shows the LR statistics and the corresponding p-values for the second data set. From Table 4, the computed pvalues are smaller than 0.05, so the null hypotheses are rejected for all sub-models. We conclude that the RBOLL-N model fits the first data better than the its sub-models according to the LR test results.    Figure 3(b) provides the fitted densities, the cumulative and survival functions and P-P plot for the RBOLL-N distribution. It is clear from these plots that the RBOLL-N distribution yields the best fit among those of the other models for these data.

Stanford heart transplant data
Recently, Brito et al. [4] introduced the Log-Topp-Leone odd log-logistic-Weibull (Log-TLOLL-W) regression model. Brito et al. [4] used the Stanford heart transplant data set to prove the usefulness of Log-TLOLL-W regression model. Here, we use the same data set to demonstrate the flexibility of LRBOLLW regression model against to Log-TLOLL-W regression model. These data set is available in p3state.msm package of R software. The sample size is n = 103, the percentage of censored observations is 27%. The aim of this study is to relate the survival times (t) of patients with the following explanatory variables: x 1 -year of acceptance to the program; x 2 -age of patient (in years); x 3 -previous surgery status (1 = yes, 0 = no); x 4 -transplant indicator (1 = yes, 0 = no); c i -censoring indicator (0 =censoring, 1 =lifetime observed).
The regression model fitted to the voltage data set is given by respectively, where the random variable y i follows the LRBOLLW distribution.
The results for above regression models are presented in Table 5. The MLEs of the model parameters and their SEs, p values and −ℓ, AIC and BIC statistics are listed in Table 5. Based on the figures in Table 5, LRBOLLW model has the lowest values of the −ℓ, AIC and BIC statistics. Therefore, it is clear that LOPLW regression model outperforms among others for these data set. According to results of LRBOLLW regression model, β 1 and β 2 are statistically significant at 5% level.
The modified deviance residual, proposed by Therneau et al. (1990), is given by wherer Mi is the martingale residual. Figure 6 displays the index plot of the modified deviance residuals and its Q-Q plot against to N (0, 1) quantiles for Stanford heart transplant data set. Based on Figure 6, we conclude that none of observed values appears as possible outliers. Therefore, the fitted model is appropriate for these data set. parameters. So, the new class extends several widely known distributions. Some of its special models are considered. We demonstrate that the density function of any RBOLL-G distribution is a linear combination of exponentiated G density functions. Explicit expressions for the ordinary and incomplete moments, generating and quantile functions, mean deviations, probability weighted moments and order statistics are derived for any RBOLL-G distribution. The model parameters are estimated by maximum likelihood. A regression model based on proposed model are suggested. Three examples applied to real data illustrate the potentiality of the new family

Appendix A
We present four power series for the proof of the linear representation in Section 3. First, for a > 0 real non-integer and |u| < 1, we have the binomial expansion where the binomial coefficient is defined for any real.
Second, the following expansion holds for any α > 0 real non-integer where sr(α) = ∑ ∞ j=r (−1) r+j ( α j ) ( j r ) . Third, by expanding z λ in Taylor series, we have  Gradshteyn and Ryzhik ([8], Section 0.314) for a power series raised to a positive integer j given by  where the coefficients c i,j (for j = 1, 2, . . .) are easily obtained from the recurrence equation (for j ≥ 1) [m(j + 1) − j] am c i,j−m and c i,0 = a i 0 . Hence, the coefficients c i,j can be calculated directly from c i,0 , . . . , c i,j−1 and, therefore, from a 0 , . . . , a j . They can be given explicitly in terms of the a j 's, although it is not necessary for programming numerically our expansions in any algebraic or numerical software.
We now obtain an expansion for [G(x) a +Ḡ(x) a ] c . We can write from equations (39) and (40) where t j = t j (a) = s j (a) + (−1) j ( a j ) . Then, using (41), we can write where f i = f i (c). Finally, using equations (42), we obtain where h j = h j (a, c) = ∑ ∞ i=0 f i m i,j and m i,j = (j t 0 ) −1 ∑ j m=1 [m(j + 1) − j] tm m i,j−m (for j ≥ 1) and m i,0 = t i 0 .