A Generalized Modification of the Kumaraswamy Distribution for Modeling and Analyzing Real-Life Data

In this paper, a generalized modification of the Kumaraswamy distribution is proposed, and its distributional and characterizing properties are studied. This distribution is closed under scaling and exponentiation, and has some well-known distributions as special cases, such as the generalized uniform, triangular, beta, power function, Minimax, and some other Kumaraswamy related distributions. Moment generating function, Lorenz and Bonferroni curves, with its moments consisting of the mean, variance, moments about the origin, harmonic, incomplete, probability weighted, L, and trimmed L moments, are derived. The maximum likelihood estimation method is used for estimating its parameters and applied to six different simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes. Finally, four real-life data sets are used to illustrate the usefulness and the flexibility of this distribution in application to real-life data.


Introduction
There are a lot of popular statistical distributions that used for modeling and analyzing real data, and so did their generalization by mixing these distributions with each other. In particular, the generalization of some well-known statistical distributions provides flexible and effective modeling in the case when the standard distribution cannot statistically fit the data. Even these generalization distributions may have some limitations in modeling and analyzing some real data, especially, non-standard and abnormal data. New modifications need to be considered and studied in order to use the modeling and analysis such as non-standard data real-life data. The Kumaraswamy distribution was first introduced in 1980, by Kumaraswamy [49] as a probability density function for double-bounded random processes, then later so many researchers continued further studied about the distribution. Garg [27], Nadarajah [59], Jones [45], Mitnik [56], Gholizadeh et al [29], and Mitnik [57], developed further theoretical research on the distribution. In particular, Gholizadeh et al [29] considered classical and Bayesian point and interval estimators for the shape parameter of the Kumaraswamy distribution using Monte-Carlo simulation. Hussian [36] used the maximum likelihood estimation and Bayesian estimation methods to estimate the parameters of the generalization and modifications of this distribution, as these were among the most important reasons that prompted us to do this study. In this paper, a generalized modification of the Kumaraswamy distribution is proposed in Section 2, and its properties consisting of boundaries, limits, mode, quantities, reliability and hazard functions, and Renyi entropy, are studied in Section 3. In Section 4, we considered distributions related to this distribution as special cases. In Section 5, order statistics distribution is derived. In Section 6, its mean deviations, moment generating function, and with its moments consisting of the mean, variance, moments about the origin, harmonic, incomplete, probability weighted, L, and the trimmed L moments and its Lorenz and Bonferroni curves are obtained. The maximum likelihood estimation method for estimating its parameters was used in Section 7 and its results were applied in Section 8 to six different models simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes. Finally, in Section 9, four real-life data sets are used in order to show the usefulness and the flexibility of this distribution in application to real-life data.

The Kumaraswamy Distribution and its Generalized Modification of
Definition 1 (Kumaraswamy,[49]): The rv X having a probability density function (pdf), f is given by; where a and b are non-negative numbers, is said to have the Kumaraswamy distribution with parameters a and b. We note that the domain of the function f given by (1) is [0,1], and the parameters a and b are shape parameters. A generalized modification of (1) is given below; Let 0 < a, b, c, α, β < ∞, such thatα < β, and define the functionf by: Let us write f (x) instead of f (x; a, b, c, α, β) for simplicity. Now, we have the following proposition Proposition 1: The function f defined by (2) is a pdf and its cumulative distribution function (CDF) F given by; Proof: Since 0 < a, b, c, α, β < ∞, α < β, and aα It follows that, for any x such that, aα

Definition 2:
The rv X is said to have a generalized modification of the Kumaraswamy distribution (GMKD) with parameters a, b, c, αandβ, written as X ∼ GMKD(a, b, c, α, β), if its pdf is given by (2), or equivalently, its CDF is given by (3). Figure 1 shows some plots of the pdf of the GMKD for some of its parameter's values, inducting that this distribution has a lot of various different flexible shapes. We first note that, f (x; 1, b, c, 1, 1) is the pdf of Kumaraswamy distribution with parameters b and c, given by (1). Furthermore, we have the following represents the method or the transformation that can be used for modifying the Kumaraswamy pdf given in (1) in order to get what we have called the GMKD.

Proposition 2:
Let the rv Y be a rv having the Kumaraswamy distribution with parameters b and c, and the rv First, it is easy to find that the CDF of Y is given, for 0 < y < 1, by; Since 0 < a, b, c, α, β < ∞andα < β, then 0 < y < 1 is equivalent to aα 3. Some Properties of the GMKD

Boundaries and Some Limits of the pdf
Let us study the behavior of the pdf of the GMKD(a, b, c, α, β). At the boundary's points, we have from (2) that; and; and lim b→1 f ( aα Stat., Optim. Inf. Comput. Vol. 8, June 2020 RAFID S. A. ALSHKAKI 527

Quantile Function
Let 0 < p < 1, then the quantile function of the rv X ∼ GMKD(a, b, c, α, β), Q, defined by; can be found using (3), to be; In particular, the median of X, Med (X); is given by;

Renyi Entropy
Let us compute the Renyi entropy as a measure of variation of the uncertainty of the rv X X ∼ GMKD(a, b, c, α, β). For θ > 0 such that θ 1, we have for the rv X ∼ GMKD(a, b, c, α, β) that; Using the transformation given by (4), we have that; where B * (a, b; c) is defined by; Or equivalently; where B and B y are, respectively, the beta and the incomplete beta functions, Abramowitz and Stegun ( [3], p. 258), defied by; and We may call B * ( a, b; z) given by (9), "the upper beta function at z".

Related Distributions of GMKD
has an exponential distribution with parameters θ and b, Johnson et al [43] p. 494, with CDF given by; has a Gumbel (generalized extreme value type-I) distribution with parameters µ andβ, Forbes et al [26] p. 98 , with CDF given by; has a logistic distribution with parameters α and β, Johnson et al [44] p. 115, with CDF given by; where k is a positive constant, has a Pareto distribution with parameters k and b, Johnson et al [43] p. 574, with CDF given by; ] 1 θ has a Weibull distribution with parameters ξ, α, and θ, Johnson et al [43] p. 629, with its CDF given by; Therefore, the rv Y has a Pareto distribution with parameters γnd0. Proof of (1.2) through (1.5) can be shown on the same lines as the proof of (1.1). Lemma 2: δ .e − γ δ has a log-logistic distribution with parameters δ and γ, Johnson et al [42] p. 151, with CDF given by; has the generalized uniform distribution, Tiwari et al [76], with CDF given by; has the beta distribution with parameters 1 and c, with CDF given by;

Proof;
On the same lines as the proof of Lemma 1.

Order Statistics
Let X 1 , X 2 , . . . , X n be a random sample of size n from GMKD(a, b, c, α, β), and let X 1:n , X 2:n , . . . , X n:n be their order statistics, then for i = 1, 2, 3, . . . , n, the pdf of i-th order statistics X i:n , is given for by; (2) and (3) that; Since i = 1, 2, 3, . . . , n, we have that; Hence, using (13), we can write (12) as; And using the fact that the pdf of the rv X, f , given by (2), satisfies that; Then the pdf of the rv X i:n , f i:n , given by (12), can be written as; 6. Moments

Moments about the Origin
Let k = 1, 2, 3, . . . , then the moment of the rv X ∼ GMKD(a, b, c, α, β), of order k about zero is given by; Using the transformation given by (4), we have that; where B * ( , ; z) is "the upper beta function at z" given by (9). Now expand (1 − w) c−1 in the integral of (15), using the binomial series expansion, Abramowitz and Stegun [3] p.14, to get that; Therefore, an interesting relation can be seen from (16) and (17), for the upper beta function, is given by; or equivalently, in a general form, given in the following proposition; Proposition 6:

Mean and Variance
Using (16), the mean of X ∼ GMKD(a, b, c, α, β) is given by; And hence the variance; In particular, if α = 0 and β = 1,then and;

The Moment Generating Function
Similarly, the moment generating function of the rv X ∼ GMKD(a, b, c, α, β), M X (t) , can be found to be;

Harmonic Mean
The harmonic mean of X ∼ GMKD(a, b, c, α, β), on the same lines as that of the moment of X, is given by;

Mean Deviations
The mean deviation of X about its mean µ = E(X), MD(µ), is given by; Which can be found, Cordeiro et al [15], to be; Hence, using (3), (18) and (19), for the rev X ∼ GMKD(a, b, c, α, β), we have that; Similarly, the mean deviation of X about its median m, MD(m), is given by;

Probability Weighted Moments
The probability weighted moments of order s and r of X ∼ GMKD(a, b, c, α, β), ρ s,r , is given by; Using the transformation given in (4), we have that;

L-Moments
Let r = 1, 2, 3, . . . then the r-th L-moment γ r of X ∼ GMKD(a, b, c, α, β), is given by; which is, with the use of (20), can be found to be; In particular, the first L-moments are given by; And; Hence, the L-coefficient of variation τ, Bilkova [9], defined by;

The L and the Trimmed L-Moments
Let r = 1, 2, 3, . . . then the r-th trimmed L-moments (TL-moments) of the rv X ∼ GMKD(a, b, c, α, β), which is, with the use of (20), can be found to be; Hence, the r-th L-moments of the rv X, γ r , can be found, since the L-moments is a special case of the TL-moments, namely, when s = t = 0, that is γ r = γ (0,0) r .

Lorenz and Bonferroni Curves
For 0 < π < 1, the Lorenz curve, L(π), and Bonferroni curves, B(π), for the rv X ∼ GMKD(a, b, c, α, β), are given by; where Q(π) is the quantile function of the rv X at π, and I (z, k) is the incomplete moment of the rv X given by (19). Therefore, using (16) and (19), we have that; And similarly, that;

Parameters Estimation of the GMKD
We will use the maximum likelihood estimation (MLE) method for estimating the parameters of the GMKD. Let x 1 , x 2 , . . . , x n be a random sample from GMKD(a, b, c, α, β), as given by (2), then the likelihood functionL = L(a, b, c, α, β; x 1 , x 2 , . . . , x n ) can be written as, Therefore; Since, ∂ 2 ∂a 2 logL can be shown to be not in a simple form, therefore a local maximum of L at a has to be explicitly examined. Now; Hence also, ∂ 2 ∂b 2 logL can be shown to be not in a simple form, therefore a local maximum of L at b has to be explicitly examined. Now; Hence; Therefore, ∂ 2 ∂c 2 logL < 0, which indicates that L has a local maximum at c. Similarly, Hence; Or equivalently; Therefore, if a 1 > a 2 and c < a 2 a 1 −a 2 , then ∂ 2 ∂β 2 logL < 0, or if a 1 < a 2 and c > a 2 a 1 −a 2 , then ∂ 2 ∂β 2 logL < 0 , and therefore, a local maximum of L at β has to be explicitly examined. Finally; implying that ∂ ∂α logL > 0 and that ∂ 2 ∂α 2 logL > 0, then an alternative way to find the MLE of α has to be considered. Since aα Now, letting ∂ ∂a logL = 0 , we have from (21) that; And letting ∂ ∂b logL = 0 , then from (22) we have that; Similarly, letting ∂ ∂c logL = 0 , then from (23) we have that; Hence, from (25) we have that; And finally, letting ∂ ∂β logL = 0 , then from (24), we have that; Then the MLE of the parameters a, b, c, α and β, can be found by solving the following equations;

A Simulation Study
Using the results given in Section 7 and the Absoft Pro Fortran compiler for computing, different GMKD models data sets were simulated, in order to check the performance of the MLEs of the parameters of each model through their mean squares errors (MSE) computed from different simulated sample sizes. The steps are given below; 1. Six different GMKD models are considered, that have different pdf's shapes and variable ranges. 2. Six sample sizes, namely; 15,30,50,100,200, and 300 are used. 3. For each sample size, 5,000 random variates are generated from each of the given GMKD models.
4. For each sample size and for each GMKD model, the parameters are estimated using the MLE method given in Section 9. 5. The means, standard deviation (SD), bias, and MSE for each of the parameters are computed for each random sample for each sample size of the given GMKD models. Table 2 shows the actual and the MLEs parameters values of the different simulated GMKD data sets, and Figure 2 shows their corresponding pdf's plots, while Tables 3a and 3b present the bias of the parameters of the different simulated GMKD data sets for each sample size, while Tables 4a and 4b present the MSE of the parameters of the different simulated GMKD data sets for each sample size, while.        Table 5 shows some statistics for the actual grades data sets, Table 6a shows the actual and predicted first and second-semester Mathematics course examinations grades frequencies with model parameters estimates and the chi-squares goodness of fit for the proposed GMKD and the Kumaraswamy power function distribution (KPFD), which is close to the GMKD (see Section 4.1), and Table 6b shows the actual and predicted first and second-semester Introductory Statistics course examinations grades frequencies with model parameters estimates and the chi-squares goodness of fit for the proposed GMKD and the KPFD. The p-values of the Chi-Squares of for each grade four data set using the GMKD model inducting very good estimates statistically, as well as, better than all the KPFD models. These results can be seen visually also from Figure 4, illustrating the histograms and the fitted pdfs for each of the grade data sets.

544
A GENERALIZED MODIFICATION OF THE KUMARASWAMY DISTRIBUTION    * The number of internals were adjusted in order to make the expected number of observations in each interval equal to or greater than 5, which is in tern effected the number of the degree of the freedom. * The number of internals were adjusted in order to make the expected number of observations in each interval equal to or greater than 5, which is in tern effected the number of the degree of the freedom.

Summary
A new generalized modification of the Kumaraswamy distribution is introduced, and its properties consisting of boundaries, limits, mode, quantities, reliability and hazard functions, and Renyi entropy, are studied, and some of its different various shapes are given to show its flexibility. This distribution is closed under scaling and exponentiation. Some of the well-known distributions, such as the generalized uniform, triangular, beta, power function, Minimax, and some other Kumaraswamy related distributions, are special cases of this distribution. Its order statistics, mean deviations, moment generating function, and Lorenz and Bonferroni curves, with its moments consisting of the mean, variance, moments about the origin, harmonic, incomplete, probability weighted, L, and the trimmed L moments are derived. We used the maximum likelihood estimation method for estimating its parameters, and are applied to six different models, having different pdf's shapes, simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes, which are shown to be decreasing as the sample size increases. Finally, four real-life data sets of students grades' at Ahmed Bin Mohammed Military College, Doha-Qatar, the first two sets, representing the first and second semester mathematics course examination grades for the academic semesters Spring 2011 till Fall 2018, while the third and fourth sets representing the first and second semester introductory statistics course examination grades for the academic semesters Spring 2011 till Spring 2018, are used in order to show the usefulness and the flexibility of this distribution in application to real-life data sets, as well as our examinations grades data sets. We also used the Kumaraswamy power function distribution models, which is close to our proposed distribution. The results are very good, statistically, via the chi-squares goodness of fit tests, and visually, via the histograms and the fitted pdfs for each of our examinations grade data sets, and even better than all the Kumaraswamy power function distribution models.