Parameter Estimation in Multivariate Gamma Distribution

Multivariate gamma distribution finds abundant applications in stochastic modelling, hydrology and reliability. Parameter estimation in this distribution is a challenging one as it involves many parameters to be estimated simultaneously. In this paper, the form of multivariate gamma distribution proposed by Mathai and Moschopoulos [9] is considered. This form has nice properties in terms of marginal and conditional densities. A new method of estimation based on optimal search is proposed for estimating the parameters using the marginal distributions and the concepts of maximum likelihood, spacings and least squares. The proposed methodology is easy to implement and is free from calculus. It optimizes the objective function by searching over a wide range of values and determines the estimate of the parameters. The consistency of the estimates is demonstrated in terms of mean, standard deviation and mean square error through simulation studies for different choices of parameters.


Introduction
Three-parameter gamma distribution stands central in the definition of various forms of multivariate gamma distribution.The probability density function of three-parameter gamma distribution with parameters namely, α (shape), β (scale) and µ (location) denoted by Gamma (α, β, µ) is defined as For more details on the properties and applications of this distribution, see Johnson et al. [7].Princy [12] discusses an application of the extended compound gamma model.Multivariate gamma distributions with gamma marginals are common in literature.Several particular cases of these multivariate gamma densities including the bivariate cases have gained prominence over the years.Some examples include bivariate gamma distributions due to Cheriyan [3], Mathai and Moschopoulos [10], McKay [11], Kibble [8], Royen [14], Jensen [6], Sarmanov ([16], [17]).For an elaborate discussion of these distributions along with their applications, generalizations, method of construction and inter-relationships, one may refer to Balakrishnan and Lai [1], Samuel Kotz et al. [15] and Yue et al. [18] and the references cited therein.
The requirement of the common scale parameter β is to ensure that marginals are of the same form.Figure 1 depicts the density function of the multivariate gamma distribution with k = 2 for three different choices of parameters.This distribution is especially useful for models in reliability theory and stochastic process.See Mathai and Moschopoulos [9].In hydrology, it is extensively used in multivariate flood frequency analysis.See Yue [19].
McKay's bivariate gamma distribution is a special case of the form defined in (2) with µ 1 = µ 2 = 0. Clarke [4] employed McKay's bivariate model for estimating the mean annual streamflow from precipitation data.Some important properties of k-variate gamma distribution as discussed in Mathai and Moschopoulos [9] are listed below.
• The marginal distribution of Z i is three-parameter gamma distribution of the form given in (1), i.e. .
• The conditional density of , where Beta I (.) and Beta II (.) denote respectively, beta distribution of first and second kind.

It is interesting to note that, in terms of
Stat., Optim.Inf.Comput.Vol.
Owing to the presence of a common scale parameter, this can be expressed as a product of k univariate three-parameter gamma distributions i.e.
The above form serves as an aid to the development of a heuristic methodology to estimate the (2k + 1) unknown parameters of the distribution.In this paper, we propose an approach to parameter estimation for the model in (4) involving the concepts of Maximum Likelihood (ML), Least Squares (LS) and Maximum Product of Spacings (MPS).
The motivation for the present study is due to the fact that parameter estimation has not been attempted for this distribution before.Hence, an attempt in this direction will offer insights into the issues and challenges if any, in parameter estimation.This will provide opportunities to develop new methodologies thereby widening the scope of application of the distribution in other domains of research.
The primary contribution of this paper is to propose a heuristic algorithm for parameter estimation and assess its performance through simulation studies.The proposed approach is easy to implement and is free from calculus.It optimizes the objective function by searching over a wide range of values and determines the estimate of the parameters.Rest of the paper is structured as follows.Section 2 details the methods of estimation in multivariate gamma distribution.Section 3 contains a method for random variate generation from the k-variate gamma distribution and the algorithmic description of the proposed methodology.In Section 4, we present and discuss simulation results for bivariate gamma distribution.The estimates obtained are compared in terms of Average (AVG), Standard Deviation (SD) and Mean Square Error (MSE).Section 5 concludes the paper with discussion.

Methods of Estimation
The complex structure of the k-variate gamma distribution requires solving non-linear equations for obtaining estimates.Even typical iterative optimization procedures like Nelder-Mead Algorithm, Genetic Algorithm (GA) often fail to produce results.For instance, Nelder-Mead method results in "Not a Number" (NaN) when implemented via optim() function in R.Moreover, the accuracy of the estimates obtained through these algorithms decreases when the dimension of parameters increase.
In situations where the marginals follow gamma distribution, method of moments has been extensively used in parameter estimation.See Yue [19].However, Fisher [5] has reasoned that the classical method of moments in general is inefficient, except when it closely approximates normality and recommends the use of ML estimation.Hence, to estimate the (2k + 1) parameters of k-variate gamma distribution, the methods of ML as well as MPS and LS are considered.

Maximum Likelihood Estimation
Based on a random sample of size n from k-variate gamma distribution with probability density function defined in (2), the likelihood (L) and log-likelihood (log L) functions, respectively are given as The corresponding (2k + 1) log-likelihood equations are: ) where . The non-linear equations defined in ( 7)-( 9) have no closed form solutions.As pointed out earlier, numerical optimization algorithms like Nelder-Mead, GA often fail to provide solutions owing to the large number of parameters.It is found that these methods fail to produce numerical estimates even when k = 2. Also, Fisher's scoring method cannot be used since the regularity conditions are not met.It is interesting to note that the likelihood function defined in (5) can be expressed as a product of k univariate three-parameter gamma likelihoods using (4).As a consequence, the unknown parameters of the k-variate gamma distribution are estimated using its marginal distributions.In this perspective, we consider n random samples from a three-parameter gamma distribution with density defined in (1).The corresponding likelihood L and log-likelihood functions log L are In order to obtain estimates for the unknown parameters, the proposed methodology resorts to direct maximization of the function given in (10).
. The maximum product of spacings estimator for θ is the one which maximizes the logarithm of the geometric mean of sample spacings i.e. θ = arg max θϵΘ S n (θ), where The direct maximization of the function defined in (13) would in turn, lead to estimates for the unknown parameters.

Least Squares Estimation
Based on the fact that the k-variate gamma distribution defined in (2) can be expressed as a product of k univariate gamma distributions as given in (4), we consider n samples from each of the marginal densities.Let X (1) , X (2) , ..., X (n) represent the corresponding ordered observations.LS estimation involves estimating the parameters by minimizing the function L n (θ) defined as , θ = (α, β, µ) (14) where F ( x (i) ; θ ) , i = 1, 2, ..., n represents the distribution function evaluated at the observed value of X (i) .The minimization of the function defined in ( 14) produces the required estimates for the unknown parameters.

Proposed Methodology
In this section, we present a method for generating observations from k-variate gamma distribution of the form given in (2).Following this, the algorithm for determining the estimates of (2k + 1) unknown parameters of kvariate gamma distribution using ML, MPS and LS methods is described.

Random Variate Generation
The following steps are used to generate n observations from k-variate gamma distribution.

Split-Join Algorithm
Consider a random sample of size n from k-variate gamma distribution with (2k + 1) unknown parameters.In order to estimate the parameters, we propose an algorithm called Split-Join algorithm.This involves 'Splitting' the k-variate gamma distribution into k univariate three-parameter gamma distributions followed by estimation of each of its parameters and then 'Joining' the resulting estimates thereby giving estimates for the (2k + 1) parameters.
Given the k-variate gamma variable Z, the algorithm requires splitting of Z to k univariate three-parameter gamma variables V i where denote the parameters to be estimated in V i .Specify the boundaries of parameter search space by defining a lower bound (low θi ) and upper bound (up θi ) for each parameter in θ i .Let (seq αi ),(seq β ), and (seq µi ) represent the sequence of values generated between (low θi ) and (up θi ) by an incrementing factor I. The algorithm is given below.for p in (seq αi ) do 5: for q in (seq β ) do 6: for r in (seq µi ) do 7: for s in 1 and n do θ ← (p,q,r) corresponding to optim(obj) 12: End Loop 13: Combine all θi , i = 1, 2, ..., k to obtain required estimates.
The objective function obj() in the above algorithm correspond to any of the equations ( 10),( 13) and ( 14) and optim(obj) represents the optimum value of the objective function.Thus, an implementation of the above algorithm will result in estimates θi for θ i by ML, MPS, or LS method.Combining θi , i = 1, 2, ..., k gives the required estimates for k-variate gamma distribution.It is important to note that the algorithm results in k estimates for the common scale parameter β .To get a single estimate for β , one may take the arithmetic mean of the k estimates or any other meaningful function, thereof.Alternatively, one may fix the estimate of β obtained from the first run of the algorithm for the successive runs.However, this approach is meaningful only when ML method is employed due to the fact that the likelihood function of the k-variate gamma distribution can be expressed as product of the likelihoods of marginal distribution.

Simulation results
The proposed methodology is implemented in R 3.1.0for the case of bivariate gamma distribution.Datasets of sizes n = 20, 50 and 100 are simulated for MC runs(m) = 100, 500 times each with the choice of parameter values as shown in Table (I) using the random variate generation technique given in Section 3.1.The parameters are chosen so that the correlations of observations typically fall under three categories namely low, moderate and high.The correlations under the above three cases are 0.232, 0.714 and 0.939 respectively.The mean value of the estimates (AVG), its standard deviation (SD) and mean square error (MSE) obtained by ML, MPS and LS methods based on m = 100, 500 runs of the algorithm for each of the above cases are presented in Tables (II) to (X).
Note that LSE-S, MLE-S and MPS-S refer to LS, ML and MPS methods under Split-Join Methodology.MLE-M refers to ML method wherein the estimate of β obtained from the first run of the Split-Join algorithm is used throughout.methods proposed.However, LSE-S does not produce consistent estimates for all the parameters as is evident from Figure 2.
With an increase in correlation (Case 2), it is noted that the proposed methods provide similar estimates for the parameters as seen in Tables (V) to (VII).But, from Figure 3, it is evident that MLE-M outperforms others in terms of MSE .Considering Case 3, which is marked by high correlation, it can be seen from Tables (VIII) to (X) that all the proposed methods perform equally well in estimating the shape and scale parameters.However, the methods fail to provide estimates closer to the true value for the location parameters.In general, it is observed that the performance of the proposed methods improve with an increase in sample size and number of simulations.

Discussion
This paper introduces a heuristic approach for parameter estimation in k-variate gamma distribution introduced by Mathai and Moschopoulos [9].The proposed methodology makes use of the marginal distributions instead of directly using the complex structure of the k-variate gamma distribution.This is due to the fact that the probability density function of k-variate gamma distribution defined in (2) can be expressed as product of three-parameter gamma marginals as given in (4).Thus, the complexity associated with estimating all the (2k + 1) parameters simultaneously is reduced as only the parameters of the marginals need to be estimated independently.Since the density in (2) has a common scale parameter β, the 3k estimates obtained for β have to be meaningfully combined to provide a single estimate.It is observed from simulation studies that the arithmetic mean of the 3k estimates obtained for β provides a single estimate that is close to the true parameter value.Alternatively, one can estimate the scale parameter from any of the marginal distributions and fix this value for estimating the parameters of the remaining marginal distributions.
It is to be noted that the proposed methodology directly optimizes the objective function (e.g.likelihood function) of the corresponding univariate marginal distributions through an extensive search in the three-parameter space rather than solving non-linear derivatives.This offers better chances of attaining global optima which is achieved by increasing the width of the interval in which the search is carried out.The steps involved in the proposed methodology are presented in Split-Join Algorithm explained in Section 3.2.It is observed that the precision of the estimates increases by reducing the factor of incrementation.Moreover, the algorithm places no restrictions on the parameter space.However, the algorithm takes more time to arrive at the estimates owing to the fact that it searches the parameter space corresponding to each marginal distribution separately.
Simulation studies performed to obtain estimates for the parameters of bivariate gamma distribution through the proposed methodology involving ML, MPS and LS methods suggest that the resulting values are closer to the parameters.
For implementing the algorithm on real-life datasets that follow k-variate gamma distribution, the following considerations will serve as an aid.A test procedure for testing for common scale parameter of the marginal distributions can be employed before implementing the algorithm.For fixing the lower and upper bounds of the parameters in the search process, one may initially start with a random set of points in the three-dimensional search space and evaluate the objective function corresponding to each of these points.An interval can be defined around the point that optimizes the objective function and the algorithm can then be implemented by setting the endpoints of the interval as the lower and upper bounds.

1 Figure 1 .
Figure 1.Probability Density Function of Bivariate Gamma Distribution for various choice of parameters ) ← optimum of obj()11:

Figure 2 .
Figure 2. Line plot of MSEs corresponding to Case 1

Table I .
Choice of parameter values for simulating samples from bivariate gamma distribution

Table II .
AVG, SD and MSE of estimates for n=20, Case 1

Table IV .
AVG, SD and MSE of estimates for n=100, Case 1

Table VI .
AVG, SD and MSE of estimates for n=50, Case 2

Table VIII .
AVG, SD and MSE of estimates for n=20, Case 3

Table X .
AVG, SD and MSE of estimates for n=100, Case 3