On Size Biased Kumaraswamy Distribution

In this paper, we introduce and study the size-biased form of Kumaraswamy distribution. The Kumaraswamy distribution which has drawn considerable attention in hydrology and related areas was proposed by Kumarswamy. The new distribution is derived under size-biased probability of sampling taking the weights as the variate values. Various distributional and characterizing properties of the model are studied. The methods of maximum likelihood and matching quantiles estimation are employed to estimate the parameters of the proposed model. Finally, we apply the proposed model to simulated and real data sets.


Introduction
The concept of weighted distribution was first introduced by Fisher [4] to model ascertainment bias, and was later formalized in a unifying theory by Rao [13]. Let X be a random variable of interest such that X ∼ f (x; θ), where θ is a vector of parameters. Under equal probability sampling, the estimation of the parameter θ can be made with an abundance of methods. However, under size-biased schemes, the probability of sampling an individual is proportional to X r provided that E θ (X r ) < ∞ for all θ. In situations like this, the weighted probability density function is defined as where in place of f (x; θ) can be used. The weighted distributions have varieties of uses in various fields. A number of papers have appeared implicitly using the concepts of weighted and size-biased sampling distributions. Patil and Rao [11] have briefly surveyed the applications of weighted and size-biased distributions. Size-biased distributions arise naturally in a range of sampling and modeling problems in forestry [6]. They also occur in applications spanning domains including environmental sciences, econometrics, human demography and biomedical sciences [12,16]. To have an idea of their applications, one can refer to, [2,3,8,9,14,17,18,19]. When the probability of observing a positive-valued random variable is proportional to the value of the variable the resultant is size-biased distribution. Size-biased distributions of order 1 is a special case of the weighted distribution defined in (1) with weight as x. In this paper, the term size-biased distribution will be used to indicate the size-biased distribution of order 1. Thus taking r = 1, in (1) we obtain the size biased distribution which is given by the p.d.f.

The Size Biased Kumaraswamy Distribution
The Kumaraswamy distribution [7] is similar to the Beta distribution, but much simpler to use especially in simulation studies due to the simple closed form of both its probability density function and cumulative distribution function. This distribution is mainly used for variables that are lower and upper bounded. The probability density function (pdf) of the Kumaraswamy distribution (Kum) is given by where, a > 0 and b > 0 are the two shape parameters.
The r th order raw moment of the Kum is given by where B(1 + r a , b) is a beta function defined by the integral Thus, the expectation of the Kum is given by Thus using the relation (2) and (4), the pdf of the SBKD is obtained as: Ducey and Gove [3] have obtained the weighted distribution of the Generalized Beta I (GBI), the Generalized Beta II (GBII) and the Generalized Gamma (GG) distributions and have shown that the GBI, the GBII, the GG distributions are form invariant under size biased scheme. The Kumaraswamy distribution is a distribution in the GBI(α, β, p, q) family of distributions [10]. So that the SBKD is also a special case of the GBI distribution for α = a > 0, β = 1, p = 1 + 1 a , q = b.

Special Cases
1. Taking a = 1 in (5) we get, Thus the SBKD reduces to a Beta-I distribution with parameters 2 and b. 2. Taking b = 1 in (5) we get f (x; a) = (a + 1)x a ; 0 < x < 1 3. Taking a = 1 and b = 1 in (5), the SBKD reduces to a special case of the Triangular distribution f (x) = 2x; 0 < x < 1

Shape of the distribution
The SBKD is a Beta(2, b) distribution for a = 1. Hence, for any b and a fixed a = 1, the distributional shape of SBKD will be like that of a Beta(2, b) distribution. therefore for a = 1, the following shapes will be obtained. 1. We know, a Beta I distribution is always symmetric if both the parameters are equal. Hence for a = 1, the SBKD is symmetric if b = 2. 2. We know the Beta(2, 1) distribution is the Right-Triangular distribution with right angle at the right end, at x = 1 and is a straight line with slope +2. Hence the SBKD(1, 1) is also a Right Triangular distribution. 3. For a Beta(a ≥ 1, b < 1), the Beta distribution is negatively skewed J-shaped curve. Hence the SBKD(1, b < 1) is a J-shaped negatively skewed curve. 4. The Beta(2, b) is unimodal and positively skewed for b > 2 and negatively skewed for 1 < b < 2 and hence the SBKD(1, b) is also positively skewed for b > 2 and negatively skewed for 1 < b < 2. Figure 1 gives a plot of the possible shapes of the distribution for a = 1.  The possible shapes of the SBKD for a < 1 and a > 1 is discussed below: For a < 1, b < 1, the SBKD has a J-shaped negatively skewed density. For a < 1, b = 1, the SBKD has an increasing density. For a < 1, b > 1, the SBKD has either a unimodal positively skewed density or a reverse J-shaped positively skewed decreasing density. For a > 1, b < 1, the SBKD has a J-shaped negatively skewed density. For a > 1, b = 1, the SBKD has a negatively skewed increasing density. For a > 1, b > 1 the SBKD has a unimodal skewed density. where, a , b is the regularized incomplete beta function and is defined as the ratio of an incomplete beta function, B(z; α, β) = z 0 x α−1 (1 − x) β−1 dx and the complete beta function, B(α, β).
Proof. As X ∼ SBKD(a, b) so its p.d.f. is given by (5). Let F (x) be the c.d.f. of SBKD then by definition, Thus substituting f (y) from (5) we have , then its quantile function is given by (7) Q where, I p −1 (α, β) is the inverse regularized beta function defined as Therefore by using the relation (6) and (8) the quantile function of the SBKD is Corollary 2.1. The median of the SBKD is

Random number generation
Using the quantile function of the SBKD as defined in (7), a random sample of size n can be simulated. Let U be a uniform (U (0, 1)) r.v. and let Q(p), 0 ≤ p ≤ 1 be the quantile function of SBKD, then by uniform transformation rule, [5] the variable X, where x = Q(u), has a distribution with quantile function Q(p). Thus, by using the uniform transformation rule, a random sample of size n can be easily simulated from the SBKD by generating a random sample of the same size from a U (0, 1) distribution.

Moment generating function of SBKD
Proof. By definition, the moment generating function m.g.f. of a r.v. X is given by Thus, for a SBKD, the m.g.f. is Corollary 3.1. The cumulant generating function, K X (t) of the SBKD is given by Corollary 3.2. The r th order raw moment of SBKD is Corollary 3.3. The mean i.e. the 1 st order raw moment of SBKD is

Moments of SBKD
Theorem 4. Let X ∼ SBKD(a, b), then the r th order raw moment µ r and central moment µ r are defined respectively by (12) and (13) where, µ is the mean of the SBKD and is given by (11).
Proof. The proof for (12) follows directly from the Corollary 3.2. Now, the r th order central moment is defined as The E(X) or the first order raw moment of the SBKD is given by (11). Let this be denoted by µ. Thus using the relation (5), (11) and (14), the r th order central moment of the SBKD is obtained as Corollary 4.1. The first four central moments are

Skewness and kurtosis of SBKD
The skewness of the SBKD is given by The kurtosis of the SBKD is given by where,

Harmonic mean of SBKD
Theorem 5. Let X ∼ SBKD(a, b), then the harmonic mean of X is given by Proof. The harmonic mean of a r.v X is given as Thus for a SBKD, the H.M. is

The survival and hazard function
The survival function of a SBKD is given by The hazard function of the SBKD is given by 4 Parameter estimation of SBKD

Method of maximum likelihood estimation
The method of maximum likelihood estimation (MLE) selects the set of values of the model parameters that maximizes the likelihood function. By definition of the method of maximum likelihood estimation, it is required to first specify the joint density function for all observations. For a random sample of size n from SBKD, the likelihood function is given by or equivalently, To obtain the MLE of the SBKD, (16) is differentiated w.r.t. a and b and then equated to 0. Hence the likelihood equations are where, ψ(.) is the digamma function given by the logarithmic derivative of the gamma function. The set of equations (17) can be solved by using numerical methods.

Method of quantile matching estimation
The method of matching quantiles, an iterative procedure based on the ordinary least squares estimation (OLS) computes matching quantile estimation (MQE). The method of matching quantiles is based on matching theoretical quantiles of the parametric distribution against the empirical quantiles for specified probabilities, [15]. The basic idea is to match the distribution of total counterpart portfolio by that of a selected portfolio. We choose the representative portfolio to minimize the mean squared difference between the quantiles of the two distributions across all levels. This leads to the matching quantiles estimation (MQE). IfQ p is the p th sample quantile, then the equality of theoretical and empirical qunatiles is expressed by Q(p k ; θ) =Q p k for k = 1, 2, ..., d with d, the number of parameters to be estimated. The MQE is available in the r package, fitdistrplus [1]. A numerical optimization is carried out to minimize the sum of squared differences between observed and theoretical quantiles. Thus, using the R-package, "fitdistrplus" the MQE of the SBKD can be obtained.

Simulation study
It has been discussed under Subsection 3.2 that a random sample of size n can be generated from a SBKD using its quantile function. In this section some random samples with known parameters have been generated and the samples have been fitted to SBKD, Kumaraswamy distribution and Beta distribution respectively, by using the method of maximum likelihood estimation. The R package "fitdistrplus" has been used to obtain the MLE for the 3 distributions. The result obtained is summarized in Table 1.   Table 1 clearly shows that, in case of simulated data from SBKD, the estimates are more closer to the actual values. The SBKD also gives a marginally better fit than the Kum and the Beta distribution in terms of the log-likelihood function. This is quite obvious as because the sample has been drawn from the SBKD. Figure 4 gives a plot the standard error of the estimates, a and b of the simulated samples for increasing sample size.  Figure 4 indicates that for all the simulated samples the standard error of the estimates decreases with increasing sample size. Hence the method of estimations as discussed in Section (4) can be practically used to fit some real life data.

Fitting to real life data
In this section tensile strength data has been fitted to the size biased Kumaraswamy distribution by the method of MLE and MQE. The data is available in the R package gamlss.data and it contains the measurements of tensile strength of 30 polyester fibres. R package fitdistrplus has been used to obtain both the MLE and MQE. The above data fitted to the SBKD by the method of MLE and MQE is shown respectively in Figure 5   The tensile data has also been fitted to the Kumaraswamy distribution and the beta distribution by the corresponding methods and the respective log-likelihood functions, the Akaike Information Criteria (AIC) and the Bayesian Information Criteria (BIC) have been obtained. The results obtained for the three distributions, viz., SBKD, Kum and Beta have been summarized in Table 2 Table 2 clearly shows that in terms of the log-likelihood, the SBKD gives a marginally better fit to the tensile strength data as compared to the Kum and Beta.

Conclusions
We have proposed size-biased version of Kumaraswamy distribution which can be employed in modeling data from hydrology, forestry and various other related fields. Special cases of the SBKD have been discussed. The structural properties including cumulative distribution function, the Quantile function, moments, and shape of the model for varying values of the parameters have been discussed and derived. Two methods for estimation of the parameters of the model viz, MLE and MQE was studied. Using simulated data we have shown that the methods can provide reasonably good estimates of the parameters; it was shown that the standard deviations of the estimates decrease with increase in the sample size. The model has been applied to a real dataset which is indicative of potentially a better candidate than either a beta or a Kumaraswamy distribution in terms of greater likelihood.