Generalized Odd Power Cauchy Family and Its Associated Heteroscedastic Regression Model

This study introduces a generalization of the odd power Cauchy family by adding one more shape parameter to gain more flexibility modeling the complex data structures. The linear representations for the density, moments, quantile, and generating functions are derived. The model parameters are estimated employing the maximum likelihood estimation method. The Monte Carlo simulations are performed under different parameter settings and sample sizes for the proposed models. In addition, we introduce a new heteroscedastic regression model based on the special member of the proposed family. Three data sets are analyzed with competitive and proposed models.


Introduction
Rooks et al. (2010) proposed the power Cauchy (PC) distribution. The probability density function (pdf) and cumulative distribution function (cdf) of the PC distribution are and where α > 0 is a shape parameter and σ > 0 is a scale parameter. Using the PC distribution as a generator distribution, we propose a new family of distributions named as generalized odd power Cauchy-G (GOPC-G). The cdf of the GOPC-G family is given by where g(x; κ) is the pdf of the baseline distribution. The GOPC-G family contains some of the G-class distributions as its submodels. For instance, when α = 1, the GOPC-G family reduces to the odd power cauchy (OPC-G) family (Alizadeh et al, 2018). When β = 1, the GOPC-G family reduces to generalized odd half-cauchy (GOHC-G) family . Additionally, when α = β = 1, we have odd half-Cauchy (OHC-G) family. Henceforward, the density in (4) is denoted as X ∼ GOPC-G(α, β, κ). The hazard rate function (hrf) of X is given by Now, some possible relations of the GOPC-G family with other families are given.
The other main purpose is to provide a new use of the generalized odd power Cauchy Weibull (GOPC-W) distribution in a framework of regression model, where both location and dispersion parameters of a regression model based on the logarithm of the GOPC-W random variable vary across observations through regression structures. The log-transform of the random variable having the GOPC-W density is used to contruct a new regression model. The proposed regression model is appropriate for both modeling the censored and uncensored response variable. This approach is very common in constructing regression models in survival analysis, e.g., Liu The other parts of the presented study are organized as follows. Some of the special cases of the GOPC-G family are given in Section 2. In Section 3, the statistical properties of the GOPC-G family are discussed in detail. In Section 4, the parameter estimation issue of the GOPC-G family is addressed based on the maximum likelihood method. The simulation study is given to evaluate the performance of the estimation method for finite sample sizes. The heteroscedastic regression model is defined in Section 5. In Section 6, three data sets are analyzed to prove the importance of the proposed models in real life problems. Section 7 contains conclusions of the presented study.

Special Models
Three special members of the proposed family are provided.

Generalized odd power Cauchy-normal (GOPC-N)
The GOPC-N distribution opens new opportunities to generate uni or bimodal and skew-symmetric normal distributions. Its density is given by where ϕ(·) and Φ(·) are the pdf and cdf of the standard normal distribution, α > 0 and β > 0 are shape parameters, µ ∈ R is a location parameter and σ is scale parameter.

518
GENERALIZED ODD POWER CAUCHY FAMILY

Generalized odd power Cauchy-Weibull (GOPC-W)
The cdf of the Weibull distribution is ) a and a, b > 0. The GOPC-W pdf can be expressed from (4) as

Main Properties
In this section, we study some of the statistical properties of the GOPC-G family.

Quantile Function
The quantile function (qf) is a solution of F (x) = u where u ∼ U (0, 1) and it is denoted as Q G (u). The qf of the GOPC-G family is given by The qf in (6) is very useful to generate random variables from the GOPC-G family for a given baseline distribution. Additionally, Bowley's skewness (Bowley, 1901) and Moors's kurtosis (Moors, 1986) are calculated based on the quantiles. Therefore, the qf is also useful to investigate the shape of the GOPC-G family. The required formulas of these measures are given by and These measures are less sensitivity to outliers. Moreover, they can be used as an alternative measures when the moments of the distributions do not have a closed form. The results of the Bowley's skewness and Moors's kurtosis of the GOPC-W distribution are summarized in Table 1. These results show the effects of the parameters α and β on the skewness and kurtosis measures.

Linear representation
By using Taylor and generalized binomial expansions, the pdf (4) of X can be expressed as Using again the generalized binomial expansion and power series for the ratio of two power series, we have where

GENERALIZED ODD POWER CAUCHY FAMILY
and h a (x) = a g(x) G(x) a−1 denotes the exponentiated-G ("exp-G") density function. Several studies, such as Mudholkar et al. (1995) and Nadarajah and Kotz (2006), have been studied properties of the exp-G densities. Equation (7) confirms that the GOPC-G density function can be expressed as a linear combination of the exp-G densities.

Moments
Let Y k be a random variable having exp-G density h k+1 (x). The nth moment of X can be determined from (7) as Following the work of Nadarajah and Kotz (2006), E(X n ) can be obtained. Besides, ψ(n, k) was obtained for several distributions by Cordeiro and Nadarajah (2011).

Generating function
Let M X (t) = E(e t X ) be the moment generating function (mgf) of X. We obtain from (7) can be determined from the exp-G generating function. It is possible to derive the mgfs of the special members of the GOPC-G family from (9).

Theorem 1
If G(x) has a mgf, then, F (x) has a mgf. dx.

The first integral in the last line is finite and the second integral is no greater than
Then, M X (t) < ∞.

Corollary 1
Every distribution in the GOPC-G family has exactly the same number of moments of G(x).

Mean deviations
The mean deviations about the mean, δ 1 , and median, δ 2 , are, respectively The quantity F (µ ′ 1 ) can be calculated from (3) and m 1 (z) is the first incomplete moment obtained from (11) for n = 1.

Estimation and Simulation study
The maximum likelihood estimation method is used to get the maximum likelihood estimates (MLEs) of the parameters of the GOPC-G density. Let x 1 , · · · , x n be a sample from the GOPC-G density and Θ = (α, β, κ ⊤ ) ⊤ is the parameter vector. The log-likelihood function of the GOPC-G density is The direct maximization of (12) gives the MLE of Θ. This can be done by using statistical or mathematical software such as R, MATLAB or SAS. Here, we use our choice from R software and its function called as optim which is commonly used for optimization purpose for a given initial vector.

Simulation Study
It is important to investigate the asymptotic behaviours of the MLEs for the GOPC-G density under a given baseline distribution. For this purpose, we implement a simulation study to see the performance of the MLEs for a finite sample size. The simulation results are evaluated based on the following metrics: bias, mean square error (MSE), estimated average length (AL) and coverage probability (CP). The required formulas for the computation of these measures are given below.
where ϵ = α, β, a, b and (sα i , sβ i , sâ i , sb i ) represent the standard errors of the parameters. We choose the Weibull distribution as a baseline distribution of the GOPC-G distribution. So, the GOPC-W distribution is used. The simulation is replicated N = 10, 000 times for each sample sizes. The sample sizes are increased by 5 and started from n = 50 and ended by n = 1, 000. The parameters of the GOPC-W distribution is selected as α = 0.5, β = 0.5, a = 2, b = 2. Figure 1

The Heteroscedastic LGOPC-W Regression Model
Here, we introduce the heteroscedastic regression model based on the GOPC-W density, given in Section 2.3. Applying Y = log(X) transformation and a = 1/σ, b = e µ re-parametrizations on the GOPC-W density, we have where µ ∈ ℜ is the location of Y and σ is the scale parameter. The parameters α > 0 and β > 0 controls the shape of the density. The density in (13) is called as log-GOPC-W and denoted as Y ∼ LGOPC-W(α, β, µ, σ). The survival function of (13) is Using the transformation Z = (Y − µ)/σ, the standardized log-GOPC-W density is In many practical applications, the lifetimes are affected by explanatory variables. Parametric models to estimate univariate survival functions and for censored data regression problems are widely used. Standard regression models require the assumption of homogeneity of error variances. Therefore, it is necessary to check the possible heteroscedasticity.
Let y i be a response variable following the LGOPC-W density and x i = (x i1 , . . . , x ip1 ), w i = (w i1 , . . . , w ip2 ) are independent variable vectors for the location and scale parameters of the LGOPC-W regression model. The regression model is given by where z i is in (15). The independent variables are linked to the location parameter µ i by identity link function, given as µ i = x i η. The heteroscedasticity is modeled by means of the scale parameter of the response variable and log-link function is used to link the independent variables to the scale parameter, given as σ i = exp (w i τ ) where η = (η 1 , · · · , η p1 ) and τ = (τ 1 , · · · , τ p2 ) are the regression parameter vectors. Note that when σ i = σ, the regression model (16) reduce to the homoscedastic regression model, where only µ is modeled using explanatory variables. The general formula for the log-likelihood function of the regression model for the parameter vector Θ = (α, β, η , τ ) from model (16) (13) and S(y i ) is the survival function (14) of Y i . The response variable is defined as y i = min{log(x i ), log(c i )} where x i is the observed lifetime and c i is the censoring time. Under these specifications, the log-likelihood function of the LGOPC-W regression model is where u i = exp(z i ), z i = (y i − µ i )/σ i , and r is the number of uncensored observations. The unknown parameter vector Θ is obtained by maximizing the equation in (17) by using the optim function of R software. Further, we can use the LR statistic for testing if the dispersion is constant for different ranges/levels of the explanatory variables (similar to the assumption of homogeneity of variance).

Simulation study
Here, the performance of the MLEs of the parameters in the heteroscedastic LGOPC-W regression model is discussed by means of a simulation study. We generate N = 1, 000 samples of sizes n = 50, 250, 500 and 1000 from model (16) considering the structures µ = η 0 + η 1 x 1 and σ = exp(τ 0 + τ 1 x 1 ), where x 1 is generated from a binomial (n, 0.5) distribution. The values of the parameters are taken as: α = 2, β = 2, η 0 = 2, η 1 = 2, τ 0 = 2, τ 1 = 0.5, and the response variable Y is generated using the inverse transform method. The simulation results are given in Table 2. The results are interpreted based on the biases, average of estimates (AEs) and MSEs. As expected, the biases and MSEs decrease as function of sample size. Also, the AEs are near the true values of the parameters.

Applications
Three data sets are analyzed to prove the importance of the GOPC-G family in real data modeling. The GOPC-G family is compared with the below families. Table 3. MLEs with their SEs of the fitted models and goodness-of-fit statistics for strengths of glass fibers data set The LR test is used to compare the GOPC-LL distribution with its sub-models. The null hypothesis H 0 : β = 1 is tested against the hypothesis H 1 : β ̸ = 1 for the comparison purpose of the GOPC-LL and GOHC-LL models. Similarly, testing the hypothesis H 0 : α = 1 against the H 0 : α ̸ = 1 is equivalent to comparison of the GOPC-LL and OPC-LL models. The results are listed in Table 4 which reveals that the GOPC-LL distribution provides better fit than its sub-models at 10% significance level for the data used.

Data: Ozone level
The second data, available in Nadarajah (2008), is about the daily ozone level measurements (in ppb = ppm×1000).
In this application, we investigate the performance of the GOPC-W distribution and compared it performance with the other generalizations of the Weibull distribution. The results are given in Table 5. From obtained results, we conclude that the GOPC-W distribution is the best among others since it has the smallest values of the model selection criteria. The results of the LR test are given in Table 6 which shows that the GOPC-W distribution exhibits better performance than its sub-models at 5% significance level.

Data: Voltage
The third data is about an experiment on the specimens of solid epoxy electrical-insulation and its application on the accelerated voltage life test (see Lawless, 2003). The sample size is n = 60 and the voltage levels are: 52.5, 55.0, 57.5. The dependent variable y i is the log-failure times of the epoxy insulation specimens and the independent variable x i1 is the voltage level. Bartlett test is used to explore possible heteroscedasticity. The p-value of the Bartlett test is obtained as 0.0241 which ensures that homogeneity of the variance assumption is violated. The heteroscedastic and homoscedastic regression models fitted to the voltage data set are given, respectively, by where y i follows the density in (13). The results of the regression models are given in Table 7. The results of the log-Weibull (LW), log-OPC-W (LOPC-W) and log-EGOHC-W (LEGOHC-W) regression models are also given as competitive models. The best model is selected based on the AIC and Bayesian Information Criteria (BIC) statistics. From Table 7, the heteroscedastic LGOPC-W regression model is chosen the best model among others since it has the lowest values of the AIC and BIC statistics. The homoscedastic variance assumption is tested with LR test. It is also used to compare the homoscedastic and heteroscedastic LGOPC-W regression models. The null hypothesis H 0 : τ 2 = 0 is tested. The test statistic value is w = 5.499 and its p-value is 0.019 which is rejected at 5% level. It means that the heteroscedastic LGOPC-W regression model is better than the homoscedastic LGOPC-W regression model. The residual analysis is performed based on the randomized quantile residuals which was proposed by Dunn and Smyth (1996). It is calculated withr i = Φ −1 (û i ), where Φ −1 (·) is the qf of the standard normal andû i = F (t i |θ i ). Figure 2 displays the quantile-quantile plots of the randomized quantile residuals for the LGOPC-W regression models for both heteroscedastic and homoscedastic. These figures confirm that the heteroscedastic LGOPC-W regression model is better than the homoscedastic LGOPC-W regression model.

Conclusions
A new family of distributions called generalized odd power Cauchy-G, shortly GOPC-G, is introduced. The proposed family is studied in detail. The heteroscedastic regression model based on a new generalization of the Weibull distribution is proposed. Two simulation studies are carried out to assess the suitability of the maximum likelihood method estimating the model parameters. Three applications are presented to induce the researchers in favor of the proposed models. The proposed family is hoped to attract attention from the different applied sciences.