On parameter estimation in multi-parameter distributions

Many-multi parameter distributions have limit cases containing fewer parameters. This paper demonstrates that, when fitting distributions to data realized from a distribution resembling one of these limit cases, the parameter estimates obtained vary wildly between estimators. Special attention is paid to the modelling of financial log-returns. Two classes of estimators are used in order to illustrate the behaviour of the parameter estimates; the maximum likelihood estimator and the empirical characteristic function estimator. This paper discusses numerical problems associated with the maximum likelihood estimator for certain distributions and proposes a solution using Fourier inversion. In addition to simulation results, parameter estimates are obtained by fitting the normal inverse Gaussian and Meixner distributions to smooth bootstrap samples from the log-returns of the Dow Jones Industrial Average index are included as examples.


Introduction and motivation
Multi-parameter distributions are used to model a variety of phenomena.When estimating the parameters of these distributions, the aim is often to accurately model the density function of the underlying distribution from which the data are realized (with little or no regard for the individual parameter estimates themselves).Consider, for instance, the use of a multi-parameter distribution in the modelling of financial log-returns in an option pricing model.The resulting option prices are functions of the density, but not of the individual parameter estimates.
Many multi-parameter distributions possess limit cases containing fewer parameters.The aim of this paper is to numerically demonstrate the instability of the resulting parameter estimates when a multi-parameter distribution is fitted to a distribution resembling one of these limit cases.In these cases, the instability in the parameter estimates can, at least partially, be explained by noting that a distribution containing redundant parameters is fitted to the data.
Two different estimators are used in order to illustrate the instability of the parameter estimates: the maximum likelihood estimator (M LE) and the empirical characteristic function estimator (ECF E).Many multi-parameter distributions have complicated densities.In some cases, the calculation of these densities lead to numerical problems.These problems and their effects on the calculation of the M LE are discussed in this paper.The ECF E is a popular alternative to the M LE, especially in cases where the characteristic function of the distribution has a form that is simpler to evaluate numerically than is the case for the density function.
The properties of the M LE are well-known and are available in many standard texts on asymptotic theory.For example, [4] discusses the strong consistency of M LEs together with the corresponding assumptions in Chapter 4. The same text examines the asymptotic normality and asymptotic efficiency of M LEs in Chapters 18 and 20 respectively.Other excellent references on the properties of M LEs include Chapters 5 and 7 of [13] as well as Chapter 7 of [9].
The second estimation method, the ECF E, minimizes a distance measure between the characteristic function and its empirical counterpart.Minimum distance estimators, of which the ECF E is a special case, were introduced in a general setting in [15].[11] proposed the use of the ECF E for estimating the parameters of the stable law.Thereafter, the asymptotic properties of this estimator for both the single and multi-parameter cases were considered in [8].In the mentioned paper, the author showed that, under certain regularity conditions, the estimator is consistent and asymptotically normally distributed.The efficiency of the ECF E relative to the M LE is also considered in this paper.[5] presented results relating to the efficiency of the ECF E. A general exposition of the ECF E and related Fourier methods was presented in [6], while [7] considered the efficiency of empirical characteristic function procedures.
[16] provided an explanation of the use of the ECF E wherein the author favoured a practical exposition.The author pointed out that this estimation method may be preferred in cases where difficulties are encountered when applying likelihood based methods.The paper lists various assumptions implicit in the use of this estimation method and includes a number of examples.In addition to the Monte Carlo results presented, the author also provides a practical application, wherein the ECF E is used in order to fit a model to observed values of the Dow Jones Industrial Average index.
In this paper, the normal inverse Gaussian (N • IG) and Meixner distributions are used as specific examples in the context of financial modelling.These distributions are chosen since they are popular models for financial log-returns.
The remainder of the paper is structured as follows.In Section 2, we discuss the estimators used as well as the method used to obtain starting values for the numerical optimization procedures required in order to calculate the estimates.Five empirical examples are shown in Section 3; the examples relating to the N • IG distribution and the Meixner distribution are discussed separately.Section 4 is the conclusion.

Estimation methods
Two different estimators, the maximum likelihood estimator (M LE) and the empirical characteristic function estimator (ECF E), are used in order to obtain the empirical results shown in Section 3.

The maximum likelihood estimator
The density functions of many multi-parameter distributions are known.In these cases it is, in principle, possible to calculate M LEs.Usually, no formulae are available for the M LEs and numerical optimization is used to maximize the likelihood function.However, as a result of the limited computational power of the computer packages used, the calculation of the density function (and likelihood function) may break down for certain parameter sets.
Often, distributions with complicated density (and likelihood) functions have much simpler characteristic functions.In these cases, the well-known Fourier inversion formula can be used to evaluate the density numerically.Denote by f (x; θ) and ϕ (t; θ) the density and characteristic functions of some distribution with parameter set θ.The Fourier inversion formula can be expressed as: (1) see [3].In cases where the direct numerical calculation of the density function breaks down, (1) can be used to calculate the density function (and the likelihood function).For distributions for which numerical problems are encountered when calculating the density function we distinguish between two methods of obtaining M LEs.In the first one we use an algorithm that sets the likelihood function to 0 for parameter sets for which the density cannot be calculated directly; we call this method DM LE (direct M LE).In effect, this method maximizes the likelihood function by varying the parameter estimates subject ON PARAMETER ESTIMATION IN MULTI-PARAMETER DISTRIBUTIONS to the restriction that the density function can be calculated directly (without the use of Fourier inversion).In the second method, (1) is used to evaluate the likelihood when direct calculation of the likelihood is not feasible.This method is referred to as IM LE (indirect M LE).Depending on the form of the characteristic function of the distribution considered, the calculation of the IM LEs may require substantial amounts of computer time.

The empirical characteristic function estimator
The ECF E obtains parameter estimates by minimizing a given distance measure between the theoretical characteristic function of the distribution and the empirical version of this function.The characteristic functions of the N • IG and Meixner distributions are substantially simpler than the corresponding density functions, meaning that the ECF E is simpler to calculate for these distributions.Two variations of the ECF E are considered, the difference between the two estimators being the weight function used.
The empirical characteristic function of the dataset x 1 , x 2 , ..., x n is The ECF E of θ is defined as the minimizer over θ of the expression where w is some positive weight function satisfying Two variations of the weight function are used: and Below, ECF EA and ECF EN respectively denote the estimators obtained using (3) and (4).As was the case for the M LEs, in general, no formulae exist for the ECF Es so numerical optimization is employed in order to calculate the estimates.

Starting values
The numerical optimization procedures used to obtain parameter estimates require starting values.Since optimal starting values are generally unknown in practice, it is often advisable to make use of a random procedure to choose starting values.In the numerical results shown below, starting values for the M LEs are obtained as follows.First, a range for the starting values of each of the parameters of the distribution is specified; this range is specified by the user and is chosen so as to include values deemed likely based on previous experience.Then, a possible starting value for each of the parameters is simulated independently from a uniform distribution.Next, the likelihood function is calculated for the realized parameter set.Finally, this process is repeated a large number of times and the parameter set associated with the largest value of the likelihood function is chosen as the set of starting values for the optimization procedure.
A similar approach is used in order to obtain starting values for the ECF Es.In this case, the distance defined in ( 2) is calculated and the parameter set resulting in the smallest distance is used as the starting values for the optimization algorithm.
The numerical results shown below were obtained by generating 10 000 random possible starting values for each optimization procedure used.The ranges chosen for the starting values are omitted for the sake of brevity.
In the case of the IM LE, the time required to calculate the likelihood function for all of the possible starting values is extreme; in some cases, the calculation of the IM LE required roughly 1 000 times as much computer time as was the case for the DM LE.In order to alleviate this computation burden, the likelihood function is not calculated for each possible set of starting values.Instead, the procedure used is as follows.The mean and the standard deviation of the dataset under consideration are calculated.The mean of the theoretical distribution for each of the parameter sets considered as possible starting values is also calculated.Thereafter, the likelihood function is only evaluated in cases where the mean of the theoretical distribution is within three standard deviations of the mean of the dataset.The remaining combinations of possible starting values are discarded.Similar techniques to reduce the time required to calculate the remaining estimates are deemed unnecessary.
The aim of this paper is to illustrate the instability of the parameter estimates obtained when using different estimators for multi-parameter distributions.In terms of the above-mentioned method, the algorithms corresponding to the different estimators may use different starting values.The author would like to emphasize that this is not the cause of the vast differences between the estimates obtained in the empirical results shown below.Given a dataset, the parameter estimates calculated using the different estimators vary wildly even when the procedures use the same starting values.

Empirical results
The empirical results shown below were achieved using the N • IG and Meixner distributions.[1] presents a set of properties common to all financial returns.In his paper, the author argues for the use of distributions with at least four parameters as models for financial returns.Note that both the N • IG and Meixner distribution each contain four parameters.Three examples relating to the N • IG distribution and two relating to the Meixner distribution are presented.Some of the properties of these distributions are highlighted below before the examples are presented.
The empirical results relating to the first and second application of the N • IG distribution were obtained as follows.250 realizations (which roughly corresponds to the number of business days in a year) were simulated from a N • IG distribution with known parameter set.The parameters were then estimated using each of the DM LE, IM LE, ECF EA and ECF EN .This process was repeated 1 000 times in each example.A similar approach is used in order to obtain the results presented in the fourth example, where the N • IG distribution is replaced by the Meixner.
Unlike the examples mentioned above, the third and final examples are not concerned with data simulated from a known distribution.These examples are based on the observed log-returns of the Dow Jones Industrial Average (DJIA) index for the period starting on January 1, 2017 and ending on January 1, 2018.This index is comprised of a price-weighted average of 30 large publicly owned companies in the United States.A total of 250 log-returns were calculated from the prices for this index, retrieved from http://finance.yahoo.com.
Samples were drawn from the distribution underlying the calculated log-returns using the smooth bootstrap.(See [2].)The procedure used to sample from the underlying distribution is as follows.The bandwidth h used in the simulation was calculated using Silverman's rule of thumb; ) where σ denotes the sample standard deviation and n denotes the sample size (250 in this case).The value of h was calculated to be 0.0015 for this dataset.Then, smooth bootstrap samples were achieved by resampling 250 values from the observed log-returns with replacement and adding an independent realizations from a N ( 0, h 2 ) distribution to each sampled value.This process was repeated 1 000 times.The N • IG and Meixner distributions were fitted to each of the resulting samples.
As mentioned above, both of the distributions considered are characterized by four parameters.For the sake of brevity, this paper discusses the estimates obtained for one of these parameters only in each example.In the case of the N • IG distribution, the estimates of the parameter denoted by α are considered throughout.In the case of the Meixner distribution, the estimates achieved for the scale parameter α are considered in the simulated example, while the results relating to the shape parameter δ are used in the analysis of the log-return data.Similar configurations were observed for the estimates of the remaining parameters.Boxplots are employed to showcase the resulting values of the chosen parameter estimates.Note that a cross is used in order to indicate an outlier in the boxplots presented throughout.
The numerical results achieved below were calculated using Matlab.The fminsearch.m subroutine in Matlab was used in order to calculate the presented parameter estimates.The mentioned subroutine uses the Nelder-Mead optimization algorithm.(See [10].) where α > 0, |β| < α, µ ∈ R, δ > 0, and K 1 denotes the modified Bessel function of the third kind with index 1.
For the definition of this Bessel function; see [12].The characteristic function of the [14] introduced an alternative parameterization for the N • IG distribution.Results similar to those shown below were obtained when using this parameterization.
When evaluating the density function of the N • IG distribution for certain parameter sets, numerical difficulties are encountered because of the limited computation power available.The Bessel function K 1 (z) ↓ 0 as z ↑ ∞.Since most computer packages used for numerical work round all sufficiently small positive numbers to 0, the Bessel function is rounded to 0 for sufficiently large arguments.If α ↑ ∞ and δ ↑ ∞, then the exponential and Bessel functions in (5) both have large arguments.If these arguments are large enough, the computer package used sets the value of the Bessel function to 0 and that of the exponential function to ∞ so the calculation of the density function breaks down.This problem can be remedied using characteristic function inversion.The N • IG distribution is an example of a multi-parameter distributions with a special case containing fewer parameters.
For a proof, see the appendix.The first example below simulates from a N • IG with substantial skewness and excess kurtosis.The second example resembles the normal in that it exhibits little skewness and excess kurtosis.
The two examples are contrasted in order to demonstrate the difference in the behavior of estimates when the true distribution resembles its limiting case and when it does not.

Example 1:
The N • IG (1, 0.5, 0, 1) distribution Figure 1 shows boxplots of the realized values of α calculated using the four estimators discussed above.The boxplots suggest that the distribution of each of the estimators is positively skewed.The plots also clearly indicate that there are differences in the values obtained using the various estimators.Nevertheless, the median of each of the estimators is close to the true parameter value of 1. Furthermore, the largest estimate calculated using the M LE is less than 4, while the largest estimate associated with the ECF E is less than 14.
Below, the moments associated with the fitted distributions are compared to the moments of the underlying distribution.The first four standardized central moments (hereafter simply referred to as the moments) of the N • IG distribution associated with each set of parameter estimates are calculated; let ν j (θ) denote the j th moment of the distribution with parameter set θ. Table 1 shows the average and standard deviation (in brackets) of the moments associated with each estimator.The moments of the N • IG (1, 0.5, 0, 1) distribution are included in the table (at the top of the table) for comparison.    1 indicates that the moments associated with the fitted distributions closely resemble those of the true distribution.This observation, combined with the seemingly accurate estimates of α displayed in Figure 1, indicates that the estimation procedures perform satisfactorily in this case.

Example 2:
The N • IG (50, 20, 2, 20) distribution Figure 2 shows a boxplot of the DM LEs of α achieved by fitting the N • IG distribution to the realized samples of the N • IG (50, 20, 2, 20) distribution.The boxplots of the remaining estimators for α are shown in Figure 3.
In contrast to the DM LE, the estimators shown in Figure 3 clearly overestimate the true value of α (note that the scale on the vertical axis of the figure is 10 5 ).The large estimates of α can be explained by recalling that the N • IG distribution approaches the normal distribution for large values of α and δ.In effect, we are fitting a (four-parameter) N • IG distribution to data generated from a distribution resembling a (two-parameter) normal distribution.
Clearly the estimates calculated using the various estimators differ substantially.Moreover, it seems that no simple relations exist between the estimates.In order to illustrate the relationship between the estimates obtained, the correlations between the logarithms of the estimates are considered; logarithms are taken because the distributions of the estimates are positively skewed.The resulting correlation matrix is shown in Table 2.Note that the majority of the off-diagonal entries in this correlation matrix are close to 0.    Keeping the vast differences in the parameter estimates, the associated density functions are more similar than one might expect.Let θDMLE denote an estimate obtained using the DM LE, and let θIMLE denote an estimate obtained using the IM LE.For one of the datasets, the following estimates were obtained:   In order to investigate the similarity between the fitted distributions that Figure 4 seems to imply, we turn to the moments associated with the fitted distributions.Table 3 shows the average and standard deviation (in brackets) of the moments associated with each estimator.As before, the true moments are included for reference.

DM LE IM LE ECF EA ECF EN
True values  The moments associated with each estimator closely resemble the moments of the true distribution.It is interesting to note that the closeness of the fitted densities to the true density (shown in Figure 4) and the close correspondence between the moments (illustrated in Table 3) imply that the estimators succeed in accurately modelling the shape of the true distributions.Nevertheless, the parameter estimates obtained by these estimators vary wildly.

Example 3:
The N • IG distribution fitted to the log-returns of the DJIA As a third example, the N • IG distribution is fitted to samples obtained from the distribution of the log-returns of the DJIA. Figure 5 shows a boxplot of the DM LEs of α calculated using this distribution, while Figure 6 shows boxplots of the remaining estimates.As was the case in the previous example, it is clear that the estimates obtained by the various estimators differ substantially.It is also clear that extremely large estimates are often achieved.Following similar lines of inquiry as before, we are interested in determining whether or not the estimates associated with the different estimators are highly correlated.To this end, Table 4 contains the correlation matrix for the logarithms of the estimates calculated using the various estimators.The correlation between each pair of the different estimators is close to 0.  Next, we compare the moments of the fitted distributions to those of the data in order to determine the adequacy of the performance of the estimators.Table 5 shows the average and standard deviation (in brackets) of the moments associated with each estimator.Note that the entries relating to the first and second moments in Table 5 are multiplied by 10 3 and 10 4 respectively.This is done in order to ease comparisons between the results since these values are close to 0. The same convention is used in Table 8.The results reported in Table 5 clearly indicate that the moments associated with the DM LEs do not perform as well as those associated with the remaining estimators.However, upon further investigation, it becomes clear that the DM LE algorithm is able to accurately model the underlying distribution in all but a few cases.In order to illustrate this, consider the variance implied by the various estimated parameter sets.The variance of the log-returns is 1.737 × 10 −5 , while the mean of the variances implied by the obtained estimates is 4.168 × 10 −3 .Nevertheless, the median of the implied variances is 2.009 × 10 −5 , which closely corresponds to the variance of the log-returns.Figure 7 shows a boxplot of the variances implied by the DM LEs.The figure indicates that the large mean of the variances implied by DM LEs is a result of a few outliers and not a systematic overestimation.For a few samples, the DM LE algorithm was unable to converge to satisfactory results.This can, at least partially, be explained by the fact that the variance of the N • IG distribution becomes small for large values of α, in which case direct evaluation of the likelihood function may not be possible.

DM LE IM LE ECF EA ECF EN
The moments associated with the remaining parameter estimates correspond more closely to the moments of the true distribution.However, the remaining estimates slightly overestimate the skewness and the kurtosis of the log-returns.In this example, the implied moments achieved by the empirical characteristic function estimators are closer to those of the true distribution than is the case for the estimators based on the likelihood function.
Consider the density of the Meixner distribution given in (7).The functions making up this density function can lead to numerical difficulties: cos (β/2) → 0 if β → ±π, and Γ (z) ↑ ∞ as z ↑ ∞.As for the complex gamma function, if k is some constant and z ↑ ∞, then |Γ (z + ik)| 2 ↑ ∞.If, for example, the value of δ is excessively large, then both the gamma function and the complex gamma function take large values and the computer package used may set the value of both functions to ∞.In this situation, the calculation of the density function breaks down.
Similar to the case for the N • IG distribution, the Meixner distribution has the normal distribution as a limit case.The proof of the following theorem is deferred to the appendix.

Example 4:
The M eixner (0.1, −0.2, 0, 10) The parameter in this example are chosen such that the true distribution resembles the normal.Note, however, that the Meixner distribution presented in this example resembles the normal to a lesser degree than is the case for the N • IG distribution of Example 2; the kurtosis of the distribution in the present example is 3.102 compared to the 3.005 of Example 2. As a result, the results presented below are less extreme than those of the mentioned example.The Meixner distribution resembles the normal for small values of α.Since closeness to 0 is hard to judge on a figure, Figure 8 shows boxplots of α−1 .Note that the true value of α −1 = 10.Several large outliers are omitted in the figure; the maximum values associated with the IM LE and ECF E respectively are 1 018 877 and 6 095.
The results presented in Figure 8 indicate that the value of α −1 is often overestimated by a substantial margin (meaning that the realized values of α are often much closer to 0 than the true value).However, the estimators associated with the ECF E achieves medians which are close to the true value of α −1 , while the medians of both of the M LEs substantially overestimate this value.As before, the moments associated with the true and fitted distributions are compared in Table 6.The main entries in the table are the averages of the moments associated with the fitted distributions, while the associated standard deviations are reported in brackets.The averages and standard deviations associated with the first and second moments of the fitted distributions are identical.Nevertheless, the M LEs clearly outperform the ECF Es in their ability to accurately model the skewness and kurtosis of the underlying data.This observation is especially interesting in light of the fact that the boxplots presented in Figure 8 seemed to indicate that the ECF Es were better able to accurately estimate the true value of α −1 .

Example 5:
The Meixner distribution fitted to the log-returns of the DJIA As a final example, the Meixner distribution is fitted to smooth bootstrap samples taken from the distribution underlying the log-returns of the DJIA.Boxplots of the estimates obtained for δ using the various estimators are shown in Figure 9. Once again, the distributions of the estimates are positively skewed.
Table 7 shows the correlation matrix for the logarithms of the estimates.As was the case when fitting the N • IG distribution to these data, the off-diagonal elements of the correlation matrix are close to 0.  discards parameter estimates for which the calculation of the likelihood function via the direct calculation of the density function breaks down while the second uses Fourier inversion to evaluate the likelihood in these cases.

DM LE IM LE ECF EA ECF EN
The ECF E obtains parameter estimates by minimizing a distance measure between the theoretical and empirical characteristic functions.Two variations of the ECF E are considered, the difference between the two being the weight functions used.
Specific empirical examples relating to the normal inverse Gaussian (N • IG) and Meixner distributions are included.It is shown that both of these distributions have the normal distribution as a limit and the numerical problems associated with the calculation of the density functions of these distributions are explained.
Practical examples are obtained by simulating realizations from the N • IG and Meixner distributions.Samples are also drawn from the distribution of the log-returns of the Dow Jones Industrial Average index using the smooth bootstrap.In spite of the substantially different parameter estimates obtained using the various estimators, the moments of the estimated distributions correspond closely to each other and to that of the distribution from which the data are realized in the vast majority of the cases considered.We conclude that, when multi-parameter distributions are fitted to data realized from a distribution resembling a limit case containing fewer parameters, the estimators are able to accurately model the characteristics of the underlying distribution, even when the estimates obtained differ substantially from the true parameters.

Figure 1 .
Figure 1.Boxplot of the estimates of α in Example 1.

Figure 2 .
Figure 2. Boxplot of the DM LEs obtained in Example 2.

Figure 3 .
Figure 3. Boxplot of the remaining estimates of α in Example 2.

Figure 4
Figure4shows the density functions associated with θDMLE (dashed line) and θIMLE (dotted line).The true density is also included in the figure (solid line) for reference.The density functions achieved are similar in spite of substantial differences in the estimates of the parameters.

Figure 4 .
Figure 4. Comparison of estimated density functions.

Figure 5 .
Figure 5. Boxplot of the DM LEs of α obtained in Example 3.

Figure 6 .
Figure 6.Boxplot of the remaining estimates for α obtained in Example 3.

Figure 7 .
Figure 7. Boxplot of variances implied by the DM LEs.

Figure 8 .
Figure 8. Boxplot of the inverse of the estimates of α in Example 4.

Table 1 :
The moments associated with the various estimates obtained in Example 1.

Table 2 :
Correlation matrix of the logarithm of the estimates for α obtained in Example 2.

Table 3 :
The moments associated with the various estimates obtained in Example 2.

Table 4 :
Correlation matrix of the logarithm of the estimates for α obtained in Example 3. )

Table 5 :
The moments associated with the various estimates obtained in Example 3.

Table 6 :
The moments associated with the various estimates obtained in Example 4.

Table 7 :
Correlation matrix of the logarithm of the estimates for δ obtained in Example 5.