A Density-Based Empirical Likelihood Ratio Approach for Goodness-of-fit Tests in Decreasing Densities

In this paper, we propose a test for the null hypothesis that a decreasing density function belongs to a given parametric family of distribution functions against the non-parametric alternative. This method, which is based on an empirical likelihood (EL) ratio statistic, is similar to the test introduced by Vexler and Gurevich [23]. The consistency of the test statistic proposed is derived under the null and alternative hypotheses. A simulation study is conducted to inspect the power of the proposed test under various decreasing alternatives. In each scenario, the critical region of the test is obtained using a Monte Carlo technique. The applicability of the proposed test in practice is demonstrated through a few real data examples.


Introduction
Suppose that we are interested in estimating a decreasing density function of f . We may employ a non-parametric method to estimate the density function of interest under the decreasing constraint. Alternatively, one can postulate a parametric model to estimate f that addresses the monotonicity restriction. In such situations, an appropriate goodness-of-fit test is required to check the applicability of the presumed model. Therefore, the purpose of this article is to propose a suitable test to address this problem. In other words, we aim to test that f belongs to a given parametric family of decreasing density functions against the non-parametric alternative under the decreasing constraint.
One of the first papers on non-parametric density estimation is Grenander [9], in which it is shown that the non-parametric maximum likelihood estimator (NPMLE) of a decreasing density, consisting in an independent and identically distributed (i.i.d.) sample, is the slope of the least concave majorant of the empirical distribution function. Huang and Wellner [11] studied a Grenander-type estimator for a density function and hazard rate under monotone constraint in a right-censoring model.
Neyman [17], which has inspired many other studies, proposed a method to test a parametric null hypothesis against the non-parametric alternative. Another popular approach for the underlying problem is to reject the null hypothesis if an appropriate non-parametric estimator is far enough from the parametric estimator computed under the null hypothesis. For more information about these two approaches and the related references, Durot and Reboul 68 A DENSITY-BASED EMPIRICAL LIKELIHOOD RATIO APPROACH FOR GOODNESS-OF-FIT depends on the researcher prior experiences and information about the study undertaken. It is the framework of study that suggests whether this assumption is valid or not. However, some statistical tools can be applied to inspect the underlying density features. For instance, a descriptive study on data of interest can provide researchers invaluable information about the underlying distribution. Comparing different methods, histogram is one of the most suitable statistical tools for this purpose.
In this paper, we introduce a GOF test for decreasing densities based on the empirical likelihood ratio statistic. Use a methodology similar to Vexler and Gurevich [23], we propose a GOF test for the alternative hypothesis subject to decreasing constraint. The layout of the rest of this paper is as follows: In Section 2, we present the test statistic and study its consistency under the null and alternative hypotheses. The results of the simulation study on critical values and power of the test statistic will be discussed in Section 3. During Section 4, the performance of the proposed test is compared by another existing method for a set of real data.

The test statistics via EL
Let X 1 , . . . , X n be an i.i.d. sample of size n from an unknown density function f . For testing the simple hypothesis H 0 * : f = f 0 versus the simple alternative H 1 : f = f 1 , the likelihood ratio test statistic is defined as For the simple alternative hypotheses H * 0 and H 1 , the Neyman-Pearson lemma indicates that the likelihood ratiobased test is the most powerful test when f 0 and f 1 both are known.
Utilizing the EL concept to make a GOF test, Vexler and Gurevich [23] proposed a density-based method for an unknown f 1 and a known f 0 which may depend on some unknown parameters to test H 0 versus H 1 . Approximating the Neyman-Pearson test statistic non-parametrically through likelihood ratios, they introduced extensions that possess a great power. In fact, they applied the empirical likelihood method to derive the values of f H1 (·) to maximize the numerator of Equation (1) with the constraint of ∫ f (x)dx = 1 under the alternative hypothesis. The empirical likelihood method provides researchers with the auxiliary information, the more information comes via the estimation equation, while have not to choose a parametric family for the data. A comprehensive overview of the empirical likelihood method can be found in Owen [20].
Suppose that we wish to test the following parametric null hypothesis versus the non-parametric alternative that f is decreasing.
where Λ ⊆ R is a given set and also, for every λ, f λ is a known decreasing density function (up to the parameter λ) on [0, ∞).
To be more preside, let the alternative hypothesis be Grenander (1956), since F includes decreasing densities, for the random sample X 1 , . . . , X n , in this class there exists a NPMLE for the density function, which is obtained by: in which, without loss of generality, X 1 , . . . , X n are assumed to be ordered. We call this estimator the Grenander estimator in the rest of this article. Inspired by (1), applying the NPMLE (2), we derive a new EL ratio version of GOF test for testing H 0 against H a as where C α is a test threshold. Bear in mind that it is necessary now to find the asymptotic distribution of the proposed test statistic, LR n , which consists in Grenander [9] estimator, and the construction of the critical region depends on this distribution. Also, in order to make comparison in simulation study, it will observed that we need to deduce another asymptotic distribution for another test statistic reached via the Vasicek's entropy estimator. But, the asymptotic distributions of these statistics are broadly identified to be analytically difficult. Indeed, it includes the estimates of parameters of the denominator under H 0 hypothesis, which affects the variance of the test statistics too.
However, similar to the method proposed in the recent literature concerning goodness-of-fit tests, such as Vexler and Gurevich [23], Hall and Welsh [10], Mudholkar and Tian [15,16] we will not strive to calculate the critical regions analytically for the introduced tests here. Instead, considering the definition of the test statistic LR n , we have estimated the values of C α satisfying the following equation through Monte Carlo method. Table 2 illustrates the Monte Carlo roots of the above equation for different values of α and n based on exponential distribution samples.

Remark 1
It is of note that the Grenander estimator defined by (2) is distribution-free (see Grenander [9]), and therefore the numerator of LR n does not dependent on the population unknown parameter. Thus, when λ is known as λ 0 under H 0 , the distribution of LR n is not a function of an unknown parameter. Consequently, the LR n test is precise and the critical value (C α ) does not depend on an unknown parameter as well.
The weak consistency of the proposed test statistic is presented in the following theorem. For this purpose, we first define and consider the following assumptions: containing λ and a, and E(g(X 1 )) < ∞ 5. Suppose that the true decreasing density f has a bounded support S on which f (x) ≥ ζ > 0 is satisfied for all x ∈ S. Also assume that f is continuously differentiable and is bounded away from zero on S.

Theorem 1
Assume that assumptions (1) − (5) are satisfied. Then, under H 0 , where a is a finite term.

Proof
First, we consider the following component of 1 n log(LR n ): To deal with I n , first note that where F n is the least concave majorant of the empirical distribution function F n . The equality in (6) follows from the fact that log(f n (x)) is constant on intervals (u, v], where u, v are successive vertices of F n and the functions F n and F n have equal increments on these intervals. Since F n is absolutely continuous and is equal to the distribution function off n , the equality in (7) is reached. Also, the convention 0 × ∞ = 0 could be accepted becausef n (x) is zero for all x > X (n) , where X (n) denotes the largest order statistic of the random variables X 1 , . . . , X n . By using Theorem 3 in Nickl [18], for the estimated functional entropy in (7) it is observed that Now, the statistic (3) may be presented in the form of Given Equation (8), under H 0 it is revealed that Considering the weak law of large numbers and Assumption (1), it could be claimed that as n → ∞. Now, applying Assumptions (2) and (4), and one-term Taylor expansion, it is obtained that as n → ∞, where the value η n falls betweenλ n and λ. Thus, under H 0 and using (8)-(12), we have Similarly to the proof of the (12), assuming Conditions (1)-(5), we can conclude that This completes the proof of Theorem 1.
In fact, Theorem 1 shows that the power of the test goes to 1 as n → ∞ under the alternative hypothesis, that is, the test is consistent.

Simulation study
Suppose X 1 , . . . , X n are i.i.d. random variables from a distribution function F with the corresponding density function f defined over [0, ∞). It is of interest to test the null hypothesis versus where λ could be either specified or unspecified and F = {f : f is decreasing}.
To inspect the performance of the proposed test, the nominal level of significance of α = 0.05 is considered for each simulation scenario during this section. The tests procedure is described in Section 2 and 3.1. The performance of the tests are evaluated in terms of the level of significance and power, and so the corresponding results are reported in Section 3.2 and 3.3, respectively.

Tests
In this subsection, we consider simultaneously the test statistic introduced in (3) and that in Vexler and Gurevich [23], i.e. G n = min to test the null hypothesis H 0 defined in (14), where X (1) , . . . , X (n) represent the order statistics corresponding to the sample X 1 , . . . , X n . Note that X (j) = X (1) if j ≤ 1, and X (j) = X (n) once j ≥ n. Meanwhile,λ represents the MLE of λ and also 0 < δ < 1. Accordingly, we reject the null hypothesis if, and only if, log(G n ) > C α , where C α is the critical value corresponding to level α obtained by applying (5). After Vexler and Gurevich [23] proposed the G n test statistic, Vexler et al. [24] and Alizadeh Noughabi [1] revealed that this test statistic has a greater power in comparison to the other GOF tests. Our principal aim is to compare the test statistic proposed in this article (LR n ) with the G n test statistic. For this purpose, we conduct extensive Monte Carlo simulations to inspect the power of the proposed test for various alternatives.

Levels of Significance
To evaluate the performance of the proposed test LR n , the empirical levels of significance were calculated based on 10, 000 replications of the tests in controlling Type I error with the given nominal level of α = 0.05. The test statistic were calculated consisting in the various sample sizes of exponential and half normal distributions. In each iteration of exponential distribution, we obtained the sample test statistics by means of (3) and compare them to the critical values given in Table 2. It is worth mentioning that we calculated the critical regions for the half normal distribution similarly, however, we do not present these results here owing to space limitations. The percentage of rejecting the null hypothesis was considered as the size of the proposed test. The simulated results for the exponential distribution are listed in Table 1. According to this table, it was observed that the Type I error of the LR n and G n tests are well controlled. Also, it is revealed that the empirical level of the underlying tests were reasonably close to the nominal value, 0.05. Therefore, both of the tests had acceptable performances in this regard.
It should be mentioned that we have calculated the Type I error of the LR n and G n tests for the half normal distribution practically as well. Accordingly, compared with Table 1, it was observed that the LR n and G n tests had absolutely comparable results and the LR n test exhibited even better performance in terms of the Type I error in this case. However, we do not present this result due to space limitations.

Power
In this section, we investigate the empirical power of the tests. As mentioned, since obtaining the exact distribution of the proposed test statistic is complicated, we estimated the quantiles of the distribution of the test statistic LR n using a Monte Carlo simulation technique. For each scenario, we simulated exponential observations of size n and calculated the LR n statistic, then we iterated this process 10, 000 times. Finally, we obtained upper α-percentiles for values of α equal to 0.01, 0.025, 0.05 and 0.1. The estimated values of critical regions for exponential distribution are reported in Table 2.  Table 3 and Table 4 compare the empirical power of the test proposed and that of Vexler et al. [24] for the null hypotheses that data belongs to exponential distribution and half normal distribution, respectively. All the considered distributions and the corresponding decreasing density functions included in this simulation study are given in the Appendix. Table 3 indicates that the empirical power of the tests rose by increasing in the sample sizes. In addition, the test consisting in the LR n statistic had greater power than the G n -based test except when the alternatives were HL, HN(0.5) and GHN(1,2), for which the power of G n test was moderately greater. Furthermore, the results of the tests for PaI (2,0,1) using two methods were almost comparable. However, in comparison to the G n test, the power of the LR n test were substantially greater for most of the scenarios, especially when the alternatives were HC(2), DuMouchel(5), KPI(2,2,0.2,2), Reciprocal (2,5) and Weibull(0,1,0.5).
By comparison, it is obtained in Table 4 that the proposed test exhibited implicit superiority over the G n test for all the alternatives when the null hypothesis indicated the data belongs to a half normal distribution. Moreover, it was obtained that the empirical power of the tests climbed as the sample size increased.
In order to check whether or not the tests are robust, we applied both tests for exponential and half normal distribution as the null hypotheses against four other alternatives for which the conditions 3-4 seem to be problematic.  (1,2) 0.5173 0.0514 0.7275 0.0517 Table 5 compares the power of the proposed test with that of G n test for the null hypothesis that data belongs to exponential distribution while Table 6 reveals the results of similar scenarios for the null hypothesis that data satisfies half normal distribution. Two different sample sizes, n = 10 and n = 20 were considered for the simulation results reported in these tables. In both Table 5 and 6, it can be observed that the LR n test produced much better results than G n test for all the underlying alternatives.
Broadly speaking, we can conclude that the proposed LR n test had better performance and exhibited superior power in comparison to the G n test. According to this study, the simulation results suggest the benefits of applying the proposed test (LR n ) in practice. Moreover, we conducted a simulation study for other different levels of significance to inspect and compare the power of LR n and G n tests. The simulation results were comparable to the above tables, and thus not only the proposed test indicated a good performance and high power under  predetermined different significance levels, but also exhibited superiority over the LR n test. However, we have not presented this results here due to space restrictions.

Illustration
Through three examples, we illustrate how the proposed test can be applied to the GOF test for the exponential and Lomax distributions. The first real example is a set of data reported by Bekker et al. [3], which corresponds  As it can be seen in Figure 1, the histogram of the survival time is decreasing in a similar manner to the exponential density function. Also, the Vasicek's and Grenander's estimators and corresponding exponential curves have been plotted and added to the histogram. In comparison with Grenander's estimator, one can infer that the Vasicek's estimator considerably fluctuated, whereas the former is much smoother. In this regard, based on Figure  1, Grenander's estimator is a better estimator for density of the underlying data.
Regarding the test results for the patients data, G n and LR n statistics were responsible for 2.3741 and 11.081 respectively, which are less than the respective critical values of 2.5168 and 12.377. Accordingly, we do not have any reason to reject the null hypothesis, meaning the data satisfies the exponential distribution.
The second real data set represents the remission time (in months) of a random sample of 128 bladder cancer patients reported in Lee and Wang (2003 It can be observed in Figure 2 that the histogram of the remission time is decreasing in a similar manner to the exponential density function. We then added Vasicek's and Grenander's estimators to the histogram. By drawing comparison between the Vasicek's and Grenander's estimators, it is deduced that while the former varied markedly, the latter represented much smoother behavior. Consequently, based upon Figure 2, one can conclude that Grenander's estimator could model the density function of the underlying data better than Vasicek's estimator. Applying the G n and LR n statistics for the remission data, the respective amounts of 1.6539 and 12.735 were reached, which are less than the corresponding critical values 2.519 and 16.432, respectively. Thus, the null hypothesis can not be rejected, indicating that the exponential distribution fits adequately.
The third set of data which was given in Hubble [12] includes the distance between extra-galactic nebulae and Earth in mega parsecs.  Alshingiti et al. [2] concluded that this set of data follows the Lomax distribution. Figure 3 illustrates the histogram of the distance data as well as the corresponding Vasicek and Grenander estimates. The histogram was observed to follows the Lomax distribution to some degree. This fact alongside with the claim of Alshingiti et al. [2] lead us to consider and plot in Figure 3 the Lomax curve. By contrast with the fluctuations obtained in the Vasicek curve, the Grenander estimator exhibited better results. Compared to the former, the curve for the latter estimates the behavior of data more reasonably.
Turning to the tests' results obtained for the distance data, the G n and LR n statistics accounted for respective 8.200 and 9.639, while the corresponding critical values estimated were equal to 7.987 and 8.543, respectively. Therefore, the null hypothesis is rejected and, in contrast to the findings of Alshingiti et al. [2], the proposed test indicated that the distance between extra-galactic nebulae and Earth does not satisfy the Lomax distribution using this set of data.

Conclusion
In this paper, we have presented a test of hypothesis which is specifically designed for decreasing density functions using an empirical likelihood ratio statistic. We have inspected the proposed GOF test for a few distributions, including the exponential and half normal distributions, and have observed that the test outperforms the G n goodness-of-fit test. We have carried out an extensive power comparisons between these tests using a Monte Carlo simulation study. According to the simulation results, we have obtained that in most of the cases for exponential distribution and all the cases for the half normal distribution, the proposed test outperforms its counterpart. Finally, we have presented three sets of real and have illustrated how the proposed test may be applied to evaluate the GOF test for the exponential and Lomax distributions in practice.
• Reciprocal distribution Reciprocal(a, b) with the following density function: • Weibull distribution Weibull(a, b, c) with the following distribution function: • Kumaraswamy pareto type I distribution KPI(a, b, k, θ) with the following distribution function: • Dagum distribution Dagum(a, b, p) with the following distribution function: • Generalized pareto distribution GPD(α, β, µ) with the following distribution function: