A New Weighted Skew Normal Model

Weighted distribution is a valuable method for constructing flexible models and analyzing data sets. In this paper, a new weighted distribution of skew-normal is introduced with four parameters. The proposed model is a generalized version of several distributions, such as normal, bimodal normal, skew-normal, and skewed bimodal normal-normal. This weighted model is form-invariant under the proposed weight function. The basic characteristics of the model are expressed. A method has been used to generate data from the model. The maximum likelihood estimations of parameters are given and evaluated using a simulation study. The model is fitted to the three real data sets. The advantage of the proposed model has been shown on the rival distributions using appropriate criteria.


Introduction
Many data have a bimodal nature, symmetrical or asymmetrical, that the normal distribution is not suitable for fitting them. Azzalini [7] introduced the skew-normal distribution to model unimodal asymmetry data. This distribution has a skewness parameter, λ, and denoted by SN (λ). Its probability density function, pdf, is f (x; λ) = 2ϕ(x)Φ(λx), λ ∈ R (1) where ϕ(.) and Φ(.) are the pdf and cumulative distribution function (cdf) of standard normal distribution, respectively. By replacing a symmetric pdf instead of Φ(x) in (1), other skew normal distributions were introduced as skew-symmetric distributions based on Azzalini lemma (1985) [7].
Lemma 1 (Azzalini, [7]) Let f be a symmetric pdf about zero, and H is cdf of a symmetric distribution about zero.
is a pdf for any λ ∈ R. Using this lemma, Gomez et al. [13] and Nekoukhou and Alamatsaz [30] introduced the skew tnormal distribution and skew symmetric-Laplace distribution, respectively. The generalizations of the skew-normal distribution of Azzalini [7] have been discussed by many authors. Nadarajah and Kotz where weight parameter βis a known or unknown. When w(x, β) = x β , the weighted distribution called size-biased order β. The pdf of general weighted skew normal model is where w(x, β) denotes weight function with weight parameter β and C represents the normalization constant given by In this paper the weight function w(xβ) = |x| β is considered and called absolute-power order β weight function. The corresponding weighted model under this weight function is called weighted absolute-power skew normal order β. The symmetric version (λ = 0) of model is called the weighted absolute-power normal order β (or bimodal normal order β). This symmetric distribution denoted by BN (β), has been introduced by Alavi [1]. The pdf of BN (β) is given by where β ≥ 0 and Γ(.) are mode parameter and gamma function, respectively.

Some Properties
Some properties of SBN N (λ, β) are expressed in this section.
Lemma 2 Suppose f (y) is pdf of a random variable Y having a symmetric distribution about zero and H is cdf of a symmetric distribution about zero. If pdf of random variable X is given by Proof See Azzalini (1986) [6].

For k odd
where c = (k + β + 1)/(2) and the double factorial operator for odd and even numbers is equal to Proof. Without loss of generality we assume that λ ≥ 0 in (7). I. According to Lemma 2 By transformation u = −y in the first integral, we have: II. According to Theorem 4, we have: On the other hand Assuming c = (k + β + 1)/(2) and using the standard extension Stat., Optim. Inf. Comput. Vol. 10, September 2022 F. NAGHIBI, S. M. R. ALAVI AND R. CHINIPARDAZ
As an example, if β = 2 and k = 1, then c = 2 and E( . According to the idea described by Azzalini [8], the following theorem is presented for generating data from SBN N (λ, β).

Theorem 6
Suppose that Z ∼ N (0, 1) and Y ∼ BN (β) are two independent random variables. If the random variable X is defined as and the proof is completed. Using the Theorem 6, a method for generating data from SBN N (λ, β) is introduced by implementing the following steps: 1. Generate a random value Z, from standard normal distribution.

Generate a random value
When these three steps are repeated n times, a random sample of size n can be generated from SBN N (λ, β). The exact and simulated density of SBN N (1, 1), are shown in Figure 2.

The maximum likelihood estimation of parameters
Suppose x 1 , x 2 , . . . , x n is a observed random sample of from SBN N (µ, σ, λ, β), then the log-likelihood function is The maximum likelihood estimate (MLE) of µ, σ, λ and β are obtained by solving simultaneously the following equations using numerical methods such as Newton-Raphson iteration.

µ σ λ β E[μ](SD[μ]) E[σ](SD[σ]) E[λ](SD[λ]) E[β](SD[β])
where Ψ 0 (z) = d dz log Γ(z). The M LE of parameters can be calculated using the optim or nlminb commands in R package. The existence of M LE for λ parameter in SN distribution is discussed in [23] and [32]. They have explained if at least two elements of sample have different sign, then the MLE of parameter λ exists. The same result is given in [27] for each class of distributions with the following pdf where h(.) is a symmetric pdf. Therefore, this condition is necessary for M LE of λ because the SBN N is a member of this class. Asymptotic distribution of the M LE of parameters is multivariate normal distribution with mean vector (µ, σ, λ, β) ′ and covariance matrix of inversed Fisher information. The score vector and Hessian matrix are given in the ¡°Appendix¡±. Because the closed forms are not available for M LEs, they are evaluated using a simulation study.

Simulation study
In this section, the expected value and standard error (in parentheses) of M LEs are obtained using the following simulation steps: First, a sample of size n (200 and 500) is generated from the SBN N (µ, σ, λ, β) for known parameters. In the second step, for each sample in step 1, the MLEs are computed using the optim command by the L − BF GS − B method in the R package. Steps 1 and 2 are repeated 1000 times, then for each parameter, the mean and standard deviation of these 1000 repetitions are calculated as the simulated mean and standard error of the M LE. The results are presented in Tables 1 and 2. The tables show that the estimators are unbiased asymptotically, and their efficiency increase when sample size increases. Figure 3 shows the densities of the simulated samples from SBN N (1, 1, 1, 1) for n = 50, 200, 500, 1000.

Applications
In this section, the SBN N is fitted to the three datasets. Table 3 shows descriptive statistics for these datasets, where β 1 and β 2 represent the coefficients of skewness and kurtosis, respectively. The first dataset is used by Azzalini and Bowman [9]. This dataset is in "M ASS" in R package called "geyser". The data consists of 299 pairs of measurements, referring to the time interval between the starts of successive eruptions (waiting variable) and the duration of the subsequent eruption (duration variable) of the Old Faithful geyser in Yellow stone National Park, Wyoming, USA. The duration variable is considered in this article. The second dataset, called "Egg size data", is studied by Sewell and Young [35] and Famoye et al. [11]. The data represent the logarithm of the egg diameters of 88 asteroid species. The histogram of the data is roughly symmetric bimodal, as shown in Figure 6. The third dataset is called "Pollen data". The nub variable of pollen data is available at http://lib.stat.cmu.edu/datasets/pollen.data. This data are resulted from measuring geometric characteristics of a certain type of pollen.     Table 4. The Akaike criterion (AIC), modified Akaike criterion (AICC), Bayesian criterion (BIC), and Komogorov-Sminorv test (KS-Test) for the goodness of fit are given in Table 5. Based on KS-Test, in Table 5 the SBN N is the best model among the rival models. But Based on AIC, AICC and BIC criteria in Table 5 the SBN N is the best model among the rival models except for Mixed-N 2. Note that the SBN N distribution doesn¡¯t need identifiability condition (switch case) for fitting data. The histogram of data and the pdf of fitted models are shown in Figure 4. Figure 4 confirms the superiority of SBN N .     Table 6. The AIC, AICC, BIC and KS-Test for the goodness of fit are given in Table 7. Table 7 shows that the SBN N is the best model among the rival models except M ixed − N 2. But the SBN N distribution does not need a identifiability condition (switch case) for fitting data. Histogram of data and the pdf of fitted models are shown in Figure 5.
Comparison of fitted Model to the third dataset Similar to the two previous datasets, the pdf of SBN, SSCN, BEP, Mixed-N2 and SBNN are fitted to the Pollen data. The MLE of parameters for these models is calculated in Table 8. The AIC, AICC, BIC and KS-Test for the goodness of fit are given in Table 9. Table 9 shows that the SBNN is the best model among the rival models. Histogram of data and pdf of fitted models are shown in Figure 6. Figure 6 confirms the superiority of SBNN.

Conclusion
In this paper, a new weighted model with four parameters of skew-normal distributions called weighted absolute-power skew normal of order β, was introduced. The normal distribution, the skew-normal 1139 Figure 5. Histogram and pdf of the fitted models for Egg size data distribution, the bimodal normal distribution and the skew bimodal normal distribution are special cases of this model. A method for generating data from this model was presented. The maximum likelihood estimates of parameters were obtained by numerical methods and evaluated using a simulation study. This model was fitted to the duration of the eruption of the famous Old Faithful geyser data, Egg size data and Pollen data. The superiority of the model was shown by some goodness of fit criteria on the rival distributions. Although the estimates proposed in this paper have been shown to work better than other methods, comparing these estimates with Bayesian estimates may be the subject of further research.  from SBN N (µ, σ, λ, β) with the loglikelihood function (18). The following elements of the score vector are obtained by deriving from (18) relative to the parameters where Ψ 0 (z) = d dz log Γ(z) and Ψ n (z) = d n dz n log Γ(z). The hessian matrix is given by     ℓ µ,µ ℓ µ,σ ℓ µ,λ ℓ µ,β ℓ σ,µ ℓ σ,σ ℓ σ,λ ℓ σ,β ℓ λ,µ ℓ λ,σ ℓ λ,λ ℓ λβ ℓ β,µ ℓ β,σ ℓ β,λ ℓ β,β ℓ σ,β = ℓ β,σ = − n σ , ℓ λ,β = ℓ β,λ = 0.