The Use of the Extended Generalized Lambda Distribution for Controlling the Statistical Process in Individual Measurements

In quality control (QC) of the individual pieces tested, the distribution of the intended variable is not normal in many cases. Therefore, the use of conventional statistical methods to design the control charts (X chart) can lead to misleading and inaccurate results, and hence, it can cause problems to users; because in such a situation it would not be expected that the failure in the production line will be detected in a timely manner. Therefore, in this study, the use of Extended Generalized Lambda distribution (XGLD) has been proposed for approximating the statistical distribution of data and finding a quality control chart based on it. For this purpose, the distribution function of data was firstly estimated by the XGLD, and then, based on this distribution, the control limits were calculated. The research data consists of the tensile strength of 149 aluminum plates categorized into 22 classes. For the goodness of fit test, Chi-square test was utilized and to evaluate the effectiveness of the proposed model, the average run length (ARL) method was used. The results showed that the ARL values in the proposed method are lower than that in the conventional method. Since this is true at all points, it can be concluded that the use of the new method can lead to the faster detection of the systems failure and consequently, the subsequent costs can be reduced. In addition, since the XGLD is a highly flexible distribution and can estimate many conventional (and even unconventional) distributions with high precision, the use of the proposed method instead of using other cases which are based on distributions other than normal, will increase the accuracy of the systems failure detection.


Introduction
There are many situations in reality in which the size of a sample is n=1 to monitor the process, or, in other words, samples can consist of individual or individual units.For example, in destructive tests, samples are tested individually to be nearly destroyed.[16] has introduced some of these conditions, which include the use of automated measurement and inspection technology, slow production rates, high interval between the products production, the difference in repeated measurements due to laboratory or analytical errors, very low standard deviation, costly repeated sampling, irrationality of the repeated sampling, impossibility of the repeated sampling, etc.
In such cases, the use of control methods that assume the normal distribution of variables is accompanied by a significant error.Therefore, in these situations, the use of individual value (IV) and moving range (MR) charts for continuous variables is proposed.In these charts, using the moving range of individual observations, the estimation of the standard deviation and the subsequent determination of the control limits of the graphs are discussed.These 537 graphs are used in cases where only one sample can be used ( [16]).One of these tools is the moving range graph, in which the moving range between two sequential observations is defined as the absolute value of the difference between these two observations MR i = |y i − y i−1 |, in which i = 2, 3, ..., T and T is the number of sampling times.In the routine analysis of such data, first, the average of the observations and then the average of the moving ranges are calculated.The control limits are calculated as follows: where the constants D 3 and D 4 are defined using the mean and the standard deviation (d 2 and d 3 , respectively) of the relative range random variable W = R/σ as follows: D 3 = 1 − 3 d3 d2 , D 4 = 1 + 3 d3 d2 , ( [16]).One of the drawbacks to this approach is that, in order to estimate the changes of the process, the user may want to estimate the standard deviation of the process, using the standard deviation of the sample (S).But it should be noted that typically, the sampling units are assumed to be independent of each other while sequential observations are not necessarily the same.In fact, using the average of the moving range (MR) for estimating σ, emphasizes the variability between sequential observations, while utilizing the standard deviation of the sample emphasizes the variability in the entire set of data.When there is an unstable trend in the data, different estimations may be obtained by MR and S for the standard deviation of the process ( [3]).
[2] studied the performance of the above graphs for individual items in the absence of normal distribution of data.By examining several distributions for skewness and asymmetric representation of pseudo-normal data, they found out that the performance of control charts is significantly affected when the distribution of data is slightly distanced from the normal distribution.In this case, it is necessary to test the normality hypothesis, using statistical tests, before the data analysis is started, in order to avoid the consequences of misusing these graphs ( [17].) [11] obtained the X and R control charts in skewed data using Generalized Lambda Distribution (GLD), the percentile matching method and weighted variance method with left and right tail.[12] also obtained some control charts based on weighted bootstrap using some weights for each stratuum in skewed populations.The novelty of this paper with respect to [11] and [12] consists in introducing the application of Extended Generalized Lambda distribution (XGLD) for inividual sampling scheme when the population doesn't necessarily follow the normal distriution.
Therefore, it is necessary to improve the existing methods for drawing control charts in a situation where the distribution of data is not necessarily normal.
[19] have used the generalized 5-parameter lambda distribution ( [13]) to control the process.In this method, they used the MM method (Moment Matching) to estimate the distribution parameters and did not refer to the statistical control of individual items in their proposal.Therefore, in addition to the circumstances in which the distribution of the items under study is unclear, it is still possible to consider the circumstances in which individual items are to be studied.
In another paper, [18] introduced a new distribution called the Extended General Lambda Distribution (XGLD), which is much more flexible than the generalized 5-parameter distribution of Lambda; they proposed it to estimate the uncertain distribution of data as well as the conventional functions.
In this paper, with the assumption that the statistical distribution of the random variable is unknown, we propose a new method for its statistical control.In the proposed method, the XGLD has been used.This distribution is very flexible and can cover a wide range of conventional and unconventional statistical distributions.The method of moments is used to estimate the parameters of this distribution.The difference between this article and the work of [19] is the development of methods of drawing quality control charts for individual samples, using an XGLD.
The rest of the paper is as follows: First, after an introduction to GLD, we introduce the XGLD and the method for estimating its parameters.In the following, after introducing the research method, using the selected data, the proposed method will be used.Finally, the proposed method will be validated using the Average Run Length (ARL).

Generalized Lambda Distribution
Many efforts have been made to provide a distribution of probability density that can provide the necessary flexibility to estimate other statistical distributions.[27] introduced the distribution of Lambda family as a Quantile function where 0 ≤ p ≤ 1 and the form of the distribution is based on the value of the parameter λ. [21] introduced the following four-parameter Quantile function for Lambda distribution: Where p, p ∈ (0, 1), is the cumulative probability, λ 1 is the location parameter, λ 2 is the scale parameter, and λ 3 and λ 4 are the shape parameters of the distribution.This distribution can be considered as the extended form of Tukey distribution because if λ 2 = λ 3 = λ 4 = λ and λ 1 = 0, the Tukey distribution is obtained.Following this, [13] introduced the generalized five-parameter Lambda distribution as the Quantile function, in which λ 1 is the location parameter, λ 2 and λ 3 are the scale parameters of variables, and λ 4 and λ 5 are exponential parameters: There are several methods for estimating the parameters of this distribution.For a review on these methods, see [23], [10], [21], [9], [14], [7], [22], [4], [8] and [19].
3. Extended Generalized Lambda Distribution (XGLD) [18], by rewriting the generalized five-parameter Lambda distribution in equation ( 5) as: introduced the XGLD as: where, (p − c i ) * βi is the monotony power.Generalized five-parameter Lambda distribution is a special case of high distribution in which k = 2 , c 1 = 0 and c 2 = 1.When k increases, the precision of estimators will increase, although it is natural that along with that, the volume of the calculations also increases.
In this paper, a simple form of XGLD is used to determine the control limits in single-sampling, considering In this case, the number of the parameters of the model is 8.Some relationships between the GLD and XGLD distributions are presented in [18].The density function of XGLD is as follows: The m th moment for this density function is obtained as follows:

XGLD Parameters Estimation
There are several methods for estimating GLD parameters, and subsequently XGLD parameters, some of which are Maximum Likelihood (ML), Method of Moments (MM), Percentile Matching (PM), Probability Weighted Moments (PWM), Minimum Cramer von-Mises estimators (MCM), Partial Least Squares estimators (PLS).[25] compared these methods in estimating the distribution parameters of GLD in the grouped data.[18] also suggested how to use some of the above methods to estimate XGLD parameters.In this research, the method of moments is used to estimate XGLD parameters.If the number of parameters is equal to k, then by equating the 1 st to k th distribution moments with the corresponding sample moments, a nonlinear k-equation and k-unknown system will be obtained and then used to estimate the model parameters, which can be answered by using numerical methods for solving nonlinear equations.
It should be noted that this system may have more than one acceptable answer.To validate the answers, the Chi-square goodness of fit test can be used.In this way, after classifying the data in h categories, the frequency observed in each category, O i , is computed and compared with the expected frequency in each category, E i .To obtain the E i s, first, the probability of each one of the h categories is computed through the density function of XGLD.Then the expected frequency in each category is obtained by multiplying the total number of data in the probability of that category (the calculation methods are completely explained in [18]).The Chi-square test statistic is as follows: and under the null hypothesis, it has the chi-square distribution with a degree of freedom df = h − k − 1, in which, h is the number of categories and k is the number of XGLD parameters (which in this study is equal to 8 parameters).Note that h should be greater than k + 1 to have a positive df value.The null hypothesis in this test indicates that the sample is extracted from the obtained XGLD and the alternative hypothesis implies the opposite.The null hypothesis is rejected in favor of the alternative hypothesis, whenever is the higher percentile of the chi-square distribution with a degree of freedom h − k − 1, and the area of the right is equal to α.

Non-Normal Statistical Quality Control
To obtain the limits of the quality control charts, the data distribution density function should be available.In the absence of a density function, [28] used B-splines to estimate the density function and subsequently draw control charts.[5] used the Gram-Charlier density function to obtain a standard space model.[1] used the power normal family and [15] also used boron distribution to estimate the density function.[29] estimated the shape of the probability density function analytically using the Fokker-Planck-Kolmogorov equation in a random system and a form control procedure.[24] pointed out that the GLD could be used as a very good approximation for the density function of the product data of a product line.[6] introduced how to use GLD to control the statistical process, and [20] also used lambda distribution to compute the efficiency indicators of the process.
In this paper, we present the control graphs of individual values using the XGLD function and compare them with the moving range graph for individual samples.These charts can monitor the production process very well and help to identify any deviations or changes that have occurred in the distribution, while these deviations or changes may not be detected when using the summarized statistics or when it is assumed that the sample is selected from a normal society.
In this research, the high flexibility and capabilities of XGLD (even compared to the GLD) enable the researcher to use the XGLD in many quality control schemes without worrying about not knowing the distribution of the community.

Control limits based on XGLD
Control limits are used to detect signals in process data that indicate that a process is not in control.In other words, if during the graphing of the data, a point or points of the sample are outside the scope of the control limits, then the user should be aware that the data generation process may be out of control.In individual samples, the IV and MR charts are commonly used.This paper suggests utilizing the XGLD to obtain the control limits.This method can also be used for almost all control charts.
Here, because the distribution of the population from which the samples are extracted is unknown, without diminishing the whole, it is assumed that the existing data is the same as the population under study and the empirical distribution function is used as an estimation for the unknown distribution function of the population ( [26]) The algorithm used to obtain the control limits based on the XGLD function is as follows: 1. Start the production line and make sure that it is under control.2. Determine the acceptable values for the probabilities of type I and type II errors, as well as the amount of deviation, according to the desired value for the average.3. Take individual samples from the production process.4. Draw a histogram of the dataset.5. Estimate the XGLD parameters using the method of moments.It means that: a.Using [13], estimate the parameters of the four-parameter GLD.b.Estimate the XGLD parameters, using the relationship between the GLD and XGLD parameters and the given algorithm in [18].6. Calculate the upper, middle and lower control limits based on the information obtained in step 5.If in step 4 the distribution of population was asymmetric, instead of using the approximation approach 3σ, use the probabilistic approach.Otherwise, use the approximation approach 3σ. 7.After obtaining the control limits, repeat the sampling step 3 again.It is not required to update the control limits for new individual samples.Using the calculated limits in step 6, check if the production process is in control.If yes, then go to step 5. Otherwise, by stopping the production process, you will have to correct the process by identifying and eliminating the causes of the deviations.Then go to step 3.
Using this algorithm, at each sampling time, a distribution will be obtained which represents the actual distribution of data.Therefore, at each sampling time, the changes that occur in the process, both in the average and in the standard deviation, can be recognized.

Using XGLD to design a control chart (numerical example)
In this section, 149 data, which is related to the resistance of the aluminum plates against the pressure, is used to illustrate how the XGLD function is used in drawing the control charts.Since the number of the parameters , so the number of categories should be greater than the number of parameters.Hence, the data is classified into 22 categories, which can be seen in Table 1.In the following, the method for estimating XGLD parameters and then the use of the parameters in drawing the control charts are discussed.For these data, eight sample moments of order 1 to 8, M 1 to M 8 , are obtained and by equating them to the distribution moments, and solving the following nonlinear system of eight-equation and eight-unknown, the XGLD parameters will be estimated: After solving the equation ( 7), the model parameters are estimated as shown in Table 2: Using the above estimates, the 1 st to 8 th moments of the XGLD distribution, along with the sample moments calculated by the data, are given in Table 3.As you can see, the moments estimation is very close, which indicates the high accuracy of XGLD in estimating the distribution of data.
To test if the data really follows the XGLD distribution with the above parameters, the Chi-square goodness of fit test given in formula (6) was used.In each category from Table 1, the limits of categories, the observed and expected frequencies, as well as the amount of the test statistic in that category are given.
The critical value in the chi-square distribution with a degree of freedom of 13 (df = 22 − 8 − 1 = 13), and the first type error rate of 0.05 (α) is equal to 5.89.Given the fact that the test statistic is less than the critical value in the chi-square distribution, the null hypothesis in the level of α = 0.05 is not rejected in favor of the alternative hypothesis.Therefore, it can be concluded that on this collection of data, the distributed XGLD is fitted; in other words, the data follows this distribution.
The control limits are determined with α = 0.05.According to the calculated values for average and standard deviation of data in Table 3,  But the control limits computed by using the XGLD approach is obtained as (1.123, 1.840).It is clear that the control limits computed by XGLD approach is tighter than that of Montgomery approach.Figure 1 shows the contol charts of the data using both Montgomery and XGLD approaches.
Figure 1.Individual control charts of the data using Montgomery (continuous red lines) and XGLD (dashed green lines) approaches

Validating the proposed model using ARL
The average run length (ARL) of the control chart is one of the ways to assess and evaluate the quality control method.ARL is the average of the number of points that are drawn on the control charts before the observation of an alert, and for a Shewhart control chart, it is defined as the opposite probability of drawing a point out of  (Montgomery, 2012).In other words, ARL is equal to the reverse of probability of plots out of control.So, ARL 0 = α −1 for the in-control process and ARL 1 = (1 − β) −1 for the out-of-control process where α and β are the probability of the first-type and second-type error, respectively.Determining the amount of α usually depends on the features of the product line, as well as the type of the product.To obtain the control limits with a distance of three standard deviations (3σ) from the average, the value of α is considered 0.0027.Because the values of α and β are related to each other, the control charts are distinguished based on their ability to detect the second-type error.In other words, if two control charts are drawn with the same values of α, the control chart which has a lower ARL 1 and can stop the out of control system sooner is the more efficient control chart.In this research, the value of α for each graph is 0.05.
If the distribution of the data is clear, the probability of the second-type error (β) is equal to: To express it better, if the data has normal distribution the value of β is: Where δ = (µ 1 − µ 0 )/σ, Φ and φ represent the distribution and density functions of a standard normal random variable.
The algorithm which is used to estimate the β(δ) , when the variable has an XGLD distribution, is as follows: 1. Using the presented algorithm in the previous section, the XGLD parameters are estimated and then the control limits are calculated.2. Consider the value of δ that is equal to ks, where s is the standard deviation of the set which contains all individual samples, and k is usually the real number in the range −3 to 3. Add the fixed value to all the available data and then obtain the XGLD parameters for the new data set, again.So, 3. Find the values of y Lower and y U pper , in a way that the following equations are obtained: x(y Lower ) = LCL and x(y U pper ) = U CL. Then calculate β(δ), using the equation β(δ) = y U pper − y Lower , and finally ARL 1 is calculated using the relation ARL 1 = (1 − β) −1 .Note that y Lower and y U pper are two values that the equation ( 8) is established for them.4. Repeat steps 2 to 3 for all k values, and get the ARL 1 value.
In Figure 2, ARL values are drawn under normal and XGLD distributions.The value of ARL in all states is equal to (1 − β) −1 and can be computed at any point.For obtaining the ARL graphs, first β is computed.According to the definition of the second type error, when the real mean of the distribution is shifted to kσ, the surface under the new curve (displaced) at the LCL < X < U CL interval represents the value of β.Therefore, the value of β can be calculated for different values of k.The used mathematical relationships are explained in the previous sections.As it can be seen, when we assume that the data have normal distribution, all points of the chart are upper than the points of other two graphs.This means that in the case of system failure, then the true alarms is occurred with a bad delay and it leads to increasing in failure cost.
In the ARL graph of XGLD, β is calculated for each value of k (which indicates the system deviation from the mean) using the control limits calculated by the XGLD and the 4-step algorithm at each point.As can be seen, for every k, the value of β is lower, indicating that the system needs less defective products to have a true alarms.
In the Ideal ARL chart it is seen that immediately after the failure (the first positive or negative k values), the failure is detected by the control system and the production line stops.Such a control system does not exist in practice, and all the efforts of researchers in the field of quality control is to find a control system near to it.That's why it is called the ideal system.With such a control system, the probability of stopping the production line when it is not out of control (healthy) is equal to α, also the probability of type II error (β) (that an out of control system continues to generate products) is equal to zero.
As you can see, the ARL diagram is closer to the ideal ARL diagram, when the XGLD is used instead of the normal distribution as the fitted distribution on data.It should be noted that, unlike ARL 1 , the ARL 0 must be large in value, so that the possibility of a wrong alert is low.Given the α = 0.05, the ARL 0 value in both graphs is 20.Therefore, this index cannot be considered as an indicator of the effectiveness of the graphs in relation to each other.It should be noted that in this study, to compare the graphs, the value of the first-type error in both control graphs is α =0.05.As shown in Figure 2, at k = 0 (when the system is completely healthy), the ARLs value is 20.That is, when using both charts, detecting a defective product among the 20 healthy products is considered normal and does not stop the production line.This value is equal to 20 in all the other control charts that are drawn with the same alpha (regardless of the distribution of data and the method for drawing the control chart).In addition, at this point in the graph, both ARLs (that is, ARL 0 and ARL 1 ) are equal.But in other places K (that is, when the system goes out of control), it is observed that the amount of ARL 1 is lower in the proposed chart.This means that when the system is out of control and the XGLD is used in drawing a control chart, the system will stop earlier (with a smaller number of defective products) and less amount of costs.

Simulation
As discussed in the previous section, after extracting the values of α, β and c, the XGLD function for the initial data was formed as follows: x = 1.7337 + 0.6425p 0.6145 − 0.6145(1 − p) 0.5882 + 0.0018(p − 0.5202) 0.0012 According to [18], in the distribution of XGLD, p is the cumulative relative probability of x (p = F (x)).So, for generating different values of x, different values of p, (0 < p < 1) can be generated randomly.By doing this, the After generation of 1000 data (x i ), the first to eighth moments were calculated.Then, using a mechanism similar to that described in the previous section, the XGLD parameters for the simulated data are estimated.These values are shown in the Table 4 and compared to the true values of XGLD distribution: It can be seen that in almost all cases, there is no significant difference between the estimated and true values of the parameters.In some cases this difference is zero or very close to it and the total average is 0.007, which is a small value.This suggests that XGLD works well in individual measurements paradigm and it can be used as a precision estimator for the population as well.

Conclusion:
In this paper, a new XGLD-based approach is developed for drawing the control charts in individual sampling cases.In Many cases, it is not logical to pick up more than one sample from the production line.In these situations, the X chart is usually used to control the process, which is not sensitive enough to detect the detectable deviations.In addition, when the distribution of the process is not normal or when this distribution is unclear, one can no longer rely on the X chart, which is often the case in single sampling.In this paper, the distribution of data is estimated, using the XGLD.With regard to the high flexibility of this distribution compared to the GLD (which in turn is more flexible than many distributions), the distribution of data is highlighted precisely.Using this distribution, the control limits were calculated in a manner similar to those used by [19] for distributing the GLD.Comparison of the ARL index in the conventional method and the proposed method showed that the reliability of the proposed method is higher at all points.
The difference between this research and the proposed method of ( [19]) is the use of XGLD.Another innovation in this paper is the emphasis on using the XGLD to control the statistical quality of individual samples.Using the proposed method in this study will also be effective when the statistical distribution of the manufactured items is unclear.This research proposes the use of the Chi-square test for the goodness of fit.Another strong point of this chart is that it is not necessarily symmetrical, compared with the normal ARL 1 curve (which is necessarily symmetrical).In individual sampling, in many cases, the distribution of data is not normal (in these cases, the use of the proposed method can lead to discovery of the data distribution).But even if the data distribution function is normal or close to normal, it has skewness or kurtosis, practically.In these cases, the XGLD distribution can also specify these features and draw a control chart with these considerations.Finally, some data has distributions that are not yet named (such as two-peak distributions).The XGLD's ability to transform can lead to determining these distributions and drawing control charts for them.

Figure 2 .
Figure 2. ARL graphs in three ideal modes (stipple), distribution of XGLD (line) and normal distribution (continuous line)

Table 1 .
Observed and expended frequencies to estimate the chi-square goodness of fit.

Table 2 .
Estimating the model parameters of XGLD, using the sample data.
of XGLD is 8 and the degree of freedom of Chi-square test statistic is

Table 3 .
Average, standard deviation and central moments, using the data and XGLD distribution.

Table 4 .
The True and Estimated Value of XGLD parameters.x will be based on the XGLD.Accordingly, using the following process, 1000 random values were generated for x:1.Using random numbers, generate 1000 values between 0 and 1 and name them p i , 1 < i < 1000.2. Put any p i in the above formula to calculate the numerical value for each x i .3. Categorize x i s.In this simulation study, we set the number of categories to 22.