Discrete Inverted Nadarajah-Haghighi Distribution: Properties and Classical Estimation with Application to Complete and Censored data

In this article, we have developed the discrete version of the continuous inverted Nadarajah-Haghighi distribution and called it a discrete inverted Nadarajah-Haghighi distribution. The present model is well enough to model not only the over-dispersed and positively skewed data but it can also model upside-down bathtub-shaped, decreasing failure rate, and randomly right-censored data. Here, we have developed some important statistical properties for the proposed model such as quantile, median, moments, skewness, kurtosis, index of dispersion, entropy, expected inactivity time function, stress-strength reliability, and order statistics. We have estimated the model parameters through the method of maximum likelihood under complete and censored data. An algorithm to generate randomly right-censored data from the proposed model is also presented. The extensive simulation studies are presented to test the behavior of the estimators with complete and censored data. Finally, two complete and two censored data are used to illustrate the utility of the proposed model.


Introduction
In many fields, including medical science, engineering science, economics, and other allied sciences, there is a great emphasis on modelling through continuous lifetime distributions [1,2,3,4]. The discretization phenomenon generally arises when it becomes impossible or inconvenient to measure the life length of a product or device on a continuous scale. Such situations may arise in many cases, for example, in survival analysis, the survival times for those suffering from a brain tumor or period from remission to relapse may be recorded as the number of days; in reliability engineering, the functioning status of a system is examined every unit time period and observed data indicate the number of time units successfully completed prior to a breakdown; the lifetime of a copy machine is the total number of copies it produces before it fails, etc. Besides the lifetime data, the count phenomenon arises in many practical situations, such as the number of earthquakes that occur in a calendar year, the number of accidents, the number of species types in ecology, the number of insurance claims, and so on. Therefore, discrete distributions are quite meaningful to model lifetime data in such situations.
In this context, the geometric and negative binomial are known as discrete alternatives for the exponential and gamma distributions. But these distributions do not fit discrete data appropriately in all practical situations. So due to the necessity of more plausible discrete distributions to model discrete data arises in various real-world The discrete distribution generated by the above method retains the same functional form of the SF as that of its continuous counterpart. Due to this feature, many reliability characteristics remain unchanged. So, there is enough motivation to use this method for developing the discrete version of the existing continuous distributions.
In many cases, the data collection is limited by certain constraints, such as temporal or financial restrictions, making it impossible to get the whole data set. Such type of incomplete data is called censored data. To analyze these data sets, there are various censoring schemes available in the literature. The most popular censoring schemes are conventional Type I and Type II censoring schemes. In Type I censoring, the event is observed only if it occurs prior to some pre-specified time, whereas, in Type II censoring, the study continues until the predetermined number of individuals are observed to have failed. Another type of censoring is called random censoring, this censoring consists of studies where the subjects can be censored during any experiment period, with different times of censoring. An example of random censoring can be seen in clinical trials or medical studies, where the patients do not complete the course of treatment and leave before the termination point. For more details about the censoring schemes, their generalization, and analysis, one can refer to Klein and Moeschberger [23]. Randomly censored lifetime data frequently occur in many applications like medical science, biology, reliability studies, etc., which need to be analyzed properly to make correct inferences and suitable research conclusions. These data are often right-censored because it is not possible to observe the patients or the items under study until their death or patients may withdraw during the study period.
In the existing literature, the random censoring scheme is widely studied under continuous models [24]. However, a few studies considered this censoring scheme for discrete models, viz. Krishna and Goel [25], de Oliveira et al. [26], and most recently, Achcar et al. [27] discussed classical and Bayesian inference of exponentiated discrete Weibull distribution with censored data. Moreover, most of the existing discrete models were developed to analyze count data and in most situations, they fail to capture the diversity of the censored data. As a result, such occurrences offer even more encouragement to discretize the INH distribution so that we can correctly fit not just count data but also randomly right-censored data. Hence, the main objectives of the presented model are as: • To generate model that not only fit a positively skewed data set, they are also capable for modelling overdispersed data. • To develop a discrete model whose hazard rate function (HRF) can takes upside-down bathtub and decreasing shapes. • To generate models for modelling probability distribution of count data. • To provide consistently better fits than other well-known models in the literature of discrete distributions.
• To develop a model that can be appropriately used for analyzing random right censored data.
The rest of the paper is organized as follows. Section 2 introduces the two-parameter DINH distribution. In Section 3, some important distributional and reliability characteristics are studied. In Section 4, we estimate the parameters of DINH distribution by the method of maximum likelihood (ML) under complete data. To observe the behavior of the ML estimators, a simulation study is performed in Section 5. The goodness-of-fit of the proposed model to two complete real data sets is also demonstrated in Section 5. The ML estimators for the model's parameters under randomly right-censored data are discussed in Section 6. The algorithm to generate censored data from the proposed model and numerical illustration with randomly right-censored empirical and real data are also presented in Section 6. Finally, some concluding remarks are given in Section 7.

Discrete inverted Nadarajah-Haghighi distribution
The SF of continuous INH distribution is given by Using Equation (2) in (1), the PMF of the DINH distribution is where α > 0 and λ > 0 are the shape and scale parameters, respectively. One can easily verify that the proposed PMF is proper i.e. ∞ x=0 P (X = x) = 1. The cumulative distribution function (CDF) corresponding to PMF (3) is given by, Here, it is to be noted that F (0) = exp(1 − (1 + λ) α ) and the proportion of non-zero values

Shape of the PMF
The PMF plots of the DINH distribution for different parametric values are shown in Figure 1. The PMF of the suggested distribution may exhibit decreasing and unimodal (right-skewed) shapes, as seen in Figure 1. Furthermore, when α or λ, or both, are increased, the degree of asymmetry and peakedness of the PMF plots decreases. We can also observe that the PMF of DINH distribution approaches 0 as x → ∞.

Quantiles, random number generation, skewness and kurtosis
If x m symbolizes the m th quantile of a discrete RV X, then P (X ≤ x m ) ≥ m and P (X ≥ x m ) ≥ 1 − m (Rohatgi and Saleh [28]). Using this result, we have the following theorem.

Theorem 1
The m th quantile Q(m) of DINH (α, λ) distribution is given by where ⌈x m ⌉ denotes the smaller integer greater than or equal to x m .

Proof
The proof is obvious by using Equation (4).
A random number (integer) can be easily sampled from the proposed distribution by using Equation (5), where m be a uniform random number on U (0,1). In particular, the median is given as Kenney and Keeping [29] and Moors [30] provided the famous expressions for skewness and kurtosis based on the quantiles. One of the most notable characteristics of these measures is that they are less influenced by outliers and may be computed even for distributions without moments. The expression of skewness (Sk) by Kenney and Keeping [29] is .
The Moors kurtosis (Ku) proposed by Moors [30] can be presented as .
Using Equation (5) in the above expressions, one can easily obtain the Sk and Ku for the DINH distribution.

Moments and related concepts
The moments of a probability distribution are important for measuring its different properties such as mean, variance, skewness, kurtosis, etc. The r th raw moments of the DINH distribution can be obtained by using Using Equation (6), the first four raw moments of the DINH distribution are The variance of the DINH distribution is Using the raw moments (7-10), we can easily find the skewness and kurtosis based on moments from the following relations The index of dispersion (ID) is a technique for determining whether a data is equi-, under or over-dispersed. If the ID>1(<1), it indicates the over-dispersion, (under-dispersion), while if ID=1, it is equi-dispersed. In the case 1298 DISCRETE INVERTED NADARAJAH-HAGHIGHI DISTRIBUTION of the proposed model, the ID is The coefficient of variation (CV) is a relative measure of variability and is generally used to compare two independent samples based on their variability. The large value of CV indicates higher variability. For DINH distribution, the CV can be obtained as It is not possible to get a closed-form of the above expressions, therefore, we use R software to demonstrate these characteristics numerically. Table 1 lists some numerical results of the mean, variance, skewness, kurtosis, ID, and CV for the DINH distribution under different setups of parametric values. From this table, it can be concluded that: • The mean and variance of the DINH distribution grow when the value of α or λ, or both, increases. • From the observed values of skewness and kurtosis, we can conclude that the DINH distribution is positively skewed and leptokurtic. Also, as the values of α or λ, or both of them increase the skewness and kurtosis decrease. • Since ID>1, the suggested model is only appropriate for modelling over-dispersed data.
• As the value of α or λ, or both rises, the ID and CV tend to increase.

Entropy
In information theory, the entropy of an RV is the average level of "information", "surprise", or "uncertainty" contained in the possible outcomes of that variable. One of the important entropy is Rényi entropy (RE) (see, Rényi [31]). It is a crucial measure of complexity and uncertainty, and is used in many fields including problems identification in statistics, statistical inference, physics, econometrics, and pattern recognition in computer science. For the DINH distribution, the RE can be defined as (ρ > 0, ρ ̸ = 1) Another famous entropy called Shannon entropy (ShE) can be obtained as a particular case of RE as

Survival and hazard rate functions
The SF and HRF of the DINH distribution is respectively given by, As a result, the suggested distribution is more flexible to evaluate a broad range of data than traditional and recently established models due to the distinctive shapes of HRF. Also, the reversed hazard rate function (RHRF) and the second rate of failure (SRF) of the proposed model are respectively.

Expected inactivity time function
The expected inactivity time function or mean past life function (MPL), denoted by m * (i), measures the time elapsed since the failure of X given that the system has failed sometime before 'i'. It has many applications in a wide variety of areas, including reliability theory, survival analysis, actuarial research, and forensic science. In discrete setup, MPL function is defined as By replacing the CDF (4) in the expression of m * (i), we can easily obtain the MPL for the proposed model.

Stress-strength parameter
The probability R = P [X > Z], is a measure of component reliability and is known as the stress-strength (SS) parameter. The SS model describes the life of a component that has a random strength X that is subjected to a random stress Z. The component fails immediately as and when the stress applied to it exceeds the strength otherwise it works satisfactory. The SS system model has many applications in various areas including engineering, medical science, psychology, etc. For a detailed review of SS models, one may refer to Choudhary et al. [32]. For discrete independent RVs X and Z, the SS reliability is defined as where P X (x) and F Z (x) respectively denote the PMF and CDF of the independent discrete RVs X and Z. Let X ∼DINH(α 1 , λ 1 ) and Z ∼DINH(α 2 , λ 2 ). Then, from Equations (3) and (4), we have Given the difficulty of obtaining an explicit expression for the SS reliability R in this instance, we show this feature quantitatively using the R software. Tables 2 and 3 illustrate the calculated values of R for various parameters combinations. From Tables 2 and 3, we can infer that reliability decreases with λ 1 → ∞ or λ 2 → ∞ for fixed values of α 1 and α 2 . Similarly, when α 1 → ∞ or α 2 → ∞, reliability is also decreasing under particular setups of λ 1 and λ 2 .

Order statistics
The order statistics play a vital role in the construction of tolerance intervals for the distributions and drawing inferences on population parameters especially in survival analysis. Let X 1 , X 2 , ..., X n be a random sample (RS) from the DINH(α, λ) distribution. Also, let X (1) ≤ X (2) ≤ ... ≤ X (n) represents the corresponding order statistics. Then, the CDF of the r th order statistic say W = X (r) is given by

DISCRETE INVERTED NADARAJAH-HAGHIGHI DISTRIBUTION
The corresponding PMF of r th order statistics is Particularly, by setting r = 1 and r = n in Equation (13), we can obtain the PMF of minimum X (1) , . . . , X (n) and the PMF of maximum X (1) , . . . , X (n) , respectively.

Some other important results on DINH distribution
a. For α = 1, the DINH distribution reduces to a discrete inverted exponential distribution. b. If Y ∼INH(α, λ) distribution, then X = [Y ] ∼DINH(α, λ) distribution. c. If Y be a non-negative continuous RV and t be a positive constant, then X = [Y /t] is a DINH(α, λ t ) distribution for every t if and only if Y has an INH(α, λ) distribution. d. If Y has the continuous uniform distribution on (0,1), then X = λ (1−log(1−y)) 1/α −1 follows the DINH distribution.

Estimation through maximum likelihood approach with complete data
The method of maximum likelihood is one of the most often used classical point estimation techniques. The maximum likelihood estimate is the point in the parametric space that maximizes the likelihood function. Its logic is both intuitive and adaptable, and as a result, it has become a dominating approach to statistical inference. Let x 1 , x 2 , ..., x n be an RS of size 'n' from DINH(α, λ) distribution, then the likelihood function (LF) can be written as The corresponding log-likelihood (LL) function is Taking the partial derivative of the LL function with respect to the parameters α and λ, we get the following normal equations, where The ML estimator of parameters α and λ, can be found by simplifying Equations (16) and (17), but unfortunately, these equations do not yield the analytical solution. Therefore, we can use an iterative approach such as Newton-Raphson (NR) to calculate the estimate computationally through inbuilt approaches available in R-software, MATLAB, or Maple programming.

Numerical illustration with complete data
Here, we present the numerical illustrations of the proposed model based on the empirical and real data sets.

Simulation study under empirical data
In this sub-section, we observe the performance of ML estimators to estimate the unknown parameters of the proposed model. This assessment consists of the following steps: a. The MSE of the ML estimates decreases to zero as n tends to infinity. This shows the consistency of the ML estimator. Also, the AABs decreases to zero as n becomes large. b. For small values of the parameters α and λ, the ML estimator performs better as compared to the large values of α and λ. c. Based on MSEs and AABs, we observe that the estimation of the parameter α is more sensitive as compare to the estimation of the parameter λ.

Real data analysis
In this section, we use two real data sets to show that the DINH distribution can be a better model than some most popular and recently developed discrete distributions. The proposed model is compared with some competitive models listed in Table 4. In order to compare these models, -LL, Kolmogorov-Smirnov (K-S) with its p-value, Akaike information criterion (AIC), Bayesian information criterion (BIC), and Corrected Akaike information criterion (CAIC) are used to choose the best-fitted model in the conclusive stage. Based on these criteria, a model with the smallest value of -LL, K-S, AIC, BIC, CAIC, and the largest p-value is the best-fitted model for a given data as compare to other fitted models. The real data sets used here are the daily new cases of COVID-19 and the daily new deaths due to COVID-19 recorded in two regions. The detailed data description and their analysis are as follows:      This data set is modelled with DINH and other competitive models. Table 5 contains the estimated parameters and their corresponding standard errors (SEs) as well as the various fitting measures discussed earlier. From Table  5, we conclude that the DINH model is the best-performed model among others since it has the lowest values of -LL, AIC, BIC, CAIC, and K-S test statistics with the highest p-value. Figure 8 (first row) shows the profile plots for the ML estimators of α and λ under data set I, which provide the guarantee of unique existence of ML estimators. The empirical vs fitted CDFs plot in Figure 9 (left panel) also announces that the fitted CDF closely follow the pattern of the empirical CDF. Data set II: In the second application, we consider the daily new deaths in Iran. The data is available at https://www.worldometers.info/coronavirus/country/iran/ and contains the daily new deaths between 15 February to 10 March 2020. The data values are: 0, 0, 0, 0, 0, 0, 2, 2, 2, 4, 4, 3,7,8,9,11,12,11,15,16,16,21,49, 43, 54. The above data set is modelled with DINH and other competitive models. The estimated parameters and other fitting measures are reported in Table 6. From the outcomes of Table 6, we conclude that the DINH distribution is the best choice among other competitive models since it has the lowest values of -LL, AIC, BIC, CAIC, and K-S statistics with the highest p-value. The unique existence of the ML estimators of α and λ can be verified from the profile plots in Figure 8 (second row). Figure 9 (right panel) also depicts that DINH distribution is well enough to model the considered data. In this section, we derive the ML estimators of the unknown parameters of DINH distribution for random rightcensored data. An algorithm to generate random right-censored data is discussed for the DINH model. We also provide numerical illustrations based on empirical and real data sets to demonstrate the applicability of the proposed approach for analyzing random right-censored data.

Method of maximum likelihood
In the presence of right-censored observations, the contribution of i th individual to the likelihood function based on an RS (x i , d i ) of size n is where d i is a censoring indicator variable, that is, d i = 1 for an observed lifetime and d i = 0 for a censored lifetime (i = 1, 2, ..., n). Under the DINH distribution, the LF for α and λ is given by The LL corresponding to LF (19) is The first-order partial derivative of the LL function with respect to the parameters α and λ, provides the following log-likelihood equations, where Λ(x i ) has already been defined in Equation (18). By simplifying Equations (21) and (22), the ML estimators of parameters α and λ can be obtained, however, these equations do not provide an analytical solution. As a result, we utilize an iterative method, such as NR to find the estimate computationally using built-in codes in various software like R, MATLAB, and Maple programming.

Algorithm to simulate random right censored data
In this section, we present a simple algorithm to generate the randomly right-censored data from the proposed model. The algorithm consists of the following steps: Step 1: Fix the values of the parameters α and λ.
Step 5: If x For more detail review on generating random right censored data from a model, one can refer to Ramos et al. [35].

Numerical illustration using simulated random right-censored data
The performance of the ML estimators under random right-censored data is investigated in this subsection via a simulation study. The whole study is based on RSs drawn from the DINH distribution of sizes 20, 25,...,100. The parametric values of the parameters α and λ are taken as (0.5, 0.5), (1, 0.5), (5, 0.5), (0.5, 2), and (1, 2). The method described in Section 6.2 is utilized to produce the required random right-censored data. All simulation results are based on 2000 replicates for the different sample sizes considered for each parameter setting. We have calculated the MSE and AAB of the parameter estimates based on these 2000 values, and the findings are shown in Figures  10-14. From Figures 10-14, the following conclusions can be made: a. The ML estimators of the unknown parameters show the consistency property, i.e., the MSE reduces as the sample size rises. b. As n becomes larger, the AAB approaches zero. c. The ML estimator performs better for small values of the parameters α and λ than for high values of these parameters. d. From the analysis of MSEs and AABs, we find that the estimation of the parameter α is more sensitive compared to the estimation of the parameter λ.

Application to censored real data
Here, we examine two real data sets to demonstrate the applicability of the DINH model to censored data. These data sets along with their fitting are described as follows:   Data set III: The data below represent the times of remission, in weeks, for a group of 30 leukaemia patients who underwent similar treatment (see, Lawless [36], pp. 139). The censoring times are indicated with asterisks. 1, 1, 2, 4, 4, 6, 6, 6, 7, 8, 9, 9, 10, 12, 13, 14, 18, 19, 24, 26, 29, 31*, 42, 45*, 50*, 57, 60, 71*, 85*, 91. Now, we examine the DINH distribution's appropriateness for modelling the above data. To do this, we use the K-S statistic and its related p-value to determine the goodness of fit. This fitting measure indicates that the proposed model with ML estimates (SE in parenthesis)α= 0.5556(0.0915),λ= 24.5700(7.0738); adequately captures the diversity of the data, as the K-S statistic and related p-value are 0.1311 and 0.6802, respectively. The profile plots in Figure 15 (first row) clearly reveals the existence of ML estimators whereas the empirical vs fitted CDFs plot in Figure 16 (left panel) confirms the output of K-S statistics and its p-value.
Notably, we have observed the floor value of the original data. The ML estimate (SE) of the DINH distribution's parameters α and λ are 0.7556(0.5159), 9.8737(12.5442), respectively. Using these estimates with the considered data, the K-S statistics and associated p-value are 0.1588 and 0.8438, respectively. This well-known goodness-offit measure indicates that the suggested discrete model is adequate for modelling the given data. Graphically, the uniqueness of ML estimators and fitting capability can be viewed in Figure 15 (second row) and Figure 16 Figure 15. The profile plots for α and λ under data set III (first row) and under data set IV (second row).

Conclusion
In this article, we have proposed a new discrete model to analyze complete and censored data. The PMF and HRF of the proposed model have a variety of shapes that enables to capture of a wide spectrum of real data. We have derived its various crucial statistical properties including quantile, median, moments, skewness, kurtosis, index of dispersion, entropy, expected inactivity time function, stress-strength reliability, and order statistics. The unknown parameters of the proposed model with complete and censored data are estimated under the maximum likelihood approach. An algorithm to produce randomly right-censored data is provided. The extensive simulation studies are presented to the assessment of the ML estimator under complete and censored data. Finally, the fitting capability of the proposed model for complete and censored data is illustrated using four real data. Hence, we can conclude that the suggested model may be used as an alternative model to some well-known existing models to analyze complete and randomly right-censored data generated from various domains. A future plan of action regarding the current study might be an examination of the other type of censored data using the proposed model (see, Tyagi et al. [37]). We may investigate the load share model where the component failure time follows the DINH distribution. The stress-strength parameter may be examined using various censored data. In addition, a bivariate extension of the DINH distribution can be developed.