An Alternative Diagnostic Procedure for Meta-Regression

This paper proposes an alternative procedure for detecting outliers in meta-regression using the penalized maximum likelihood with smoothly clipped absolute deviation penalty function. The coordinate descent algorithm is implemented to estimate the parameters where the cross-validation criterion is used to determine the tuning parameter. Extensive simulation experiments demonstrate the usefulness of our proposed procedure as well as its improved power performance compared to previous procedures. Simulation results demonstrate that the performance has a direct relationship with the number of studies and an inverse relationship with the heterogeneity between studies. An illustrative application with real data, implementing the proposed procedure and others, is given.


Introduction
The effect-size is considered to be the unit of analysis and the fundamental quantity of interest in meta-analysis. It measures the strength, magnitude, and the direction of a relationship or difference between groups [9]. Although, the effect-size does not depend on the aspects of the study design that may vary from study to study, such as the sample size, the effect of the intervention depends on the nature of the collected data whether it is continuous, dichotomous, ordinal, counts, or time-to-event [14]. Meta-analysis is subject to bias due to the existence of extreme effect size values, which often called "outliers". The presence of outliers may complicate the analysis and influence its validity, accuracy, and robustness. To account for the presence of possible outliers, various methods have been proposed to literature [4,5,12,7,2]. In many cases, a deletion diagnostic method has been used to identify the outliers. This method is simply based on deleting one or multiple observations at the same time and then compute the deletion statistics that measure the effect of those deleted observations on parameter estimates or residuals. The popular diagnostic, graphical and numerical, methods that are used for detecting the outliers in the meta-analysis are given in the book of [13]. The problem of outliers and influential points in mixed-effect model have been studied by [21]. They developed a method based on the impact of excluding studies on various statistics such as the summary estimate, heterogeneity and residuals.
Penalized Maximum Likelihood Estimation (PMLE) is widely used for the stabilization of the estimators as well as detecting outliers. [18] discussed the possibility of detecting outliers in regression models based on the mean shift model which predicts the dependent variable by the usual linear model plus an outlier term γ i , where if γ i = 0 then the ith observation is a consistent observation and not an outlier, otherwise it is an outlier. PMLE is 55 used for outlier detection in different statistical models, including general profiles monitoring [23], and exponential regression for censored data [22].
In this paper, we develop an alternative procedure to detect the outliers in meta-regression models utilizing the penalized maximum likelihood. We utilize the coordinate descent algorithm to estimate the parameters, and the cross-validation criterion is used to determine the tuning parameter. We demonstrate via simulation experiments the usefulness of our proposed procedure as well as its improved power performance compared to previous procedures.
The rest of this paper is organized as follows. Section 2 reviews the main effect models. Section 3 discusses outliers and influential diagnostics in meta-regression. Section 4 proposes an alternative procedure for outlier detection in meta-regression based on the PMLE technique. The power of performance of the proposed procedure is investigated and compared with other available procedures via simulation as described in section 5. Finally, in section 6, we discussed an illustrative application on educational meta-analysis.

Meta-regression models
Suppose y 1 , y 2 , . . . , y k are the observed effect size estimates in a set of k independent studies, where where θ i is the true effect in the ith study and e i is the error within the studies. If θ i =θ, then the true effect is homogeneous. This leads to the fixed-effect-model, where , and σ i 2 is the sampling variance of the ith study and it is assumed to be known [9]. On the other hand, if θ i ̸ =θ, then the true effect is heterogeneous, which leads to the random-effects-model.
, µ is the mean of the true effect, and τ 2 is the amount of heterogeneity between the studies. If τ 2 = 0, then it implies the homogeneity among the true effects, thus µ = θ.
In some situations, the effect size is moderated by known variables and can be formulated by where X ij is the value of the jth moderator variable for the ith study, β 0 is the expected effect size when X ij = 0 for, j = 1, 2, · · · , p, and β j is a coefficient explain how the jth moderator influences the size of the true effect for a one-unit increase in X ij . Since the true effect sizes are now considered to be a function of both fixed and random effects, model (2) is typically called the mixed-effects model. The selection of effect model depends on the variation in the true effect sizes (heterogeneity). There are several tests and indexes of heterogeneity [14]. The most common test is the Q test, and given by Under the assumption of homogeneity Q test has chi-square distribution with k − 1 degrees of freedom. Alternatively, one can use the I 2 index where df is the degree of freedom of Q test, and I 2 index is the percentage ranging from 0% to 100%.

Outliers and diagnostics in meta-regression
In the context of meta-analysis, outlier refers to a study which does not fit the model of other studies, where we cannot explain them by an obvious characteristic of the study [13]. Some of the outlier values influence the results of the analysis and these values also called "influential points". Despite of limited interest on the problem of outliers in meta-analysis, most of the published work considered fixed and random-effect models [13,4,16,12,7]. Viechtbauer and Cheung [21] discussed the extension of some of the outliers and influential points detection procedures from the multiple regression to the meta-regression in the following subsection.

Outliers in meta-regression
The adaption of several types of residuals from multiple linear regression to meta-regression is a helpful tool to detect outliers as follows: 1. Studentized residual, which is the ratio of the raw residual e i and its standard error, and given by whereμ i is the predicted average true effect size for the ith study, and h i is the leverage of the ith study. If an outlier exist, then the estimate of τ 2 will be inflated andμ i will be pulled toward the y i value. 2. Studentized deleted residuals, which is given by whereμ i(−i) is the predicted average true effect size for the ith study based on the model that actually excludes the ith study during the model fitting.
The studentized method may be a simple method, however when there are multiple outliers, it may fail.

Identifying influential studies in meta-regression
The authors of [8] proposed a deletion one row approach to detect possible influential points based on some of the statistical measures of the fitted linear regression model, by comparing the value of statistical measure for full data with the measure after the deletion of one point at time. Viechtbauer and Cheung [21] adopted the known deletion diagnostics from linear regression to the context of meta-analysis to identify possible influential studies.
The following are the popular diagnostic tests implemented in the metafor R package [20]: 1. DFFITS: it measures the difference between the predicted mean effect for the ith study with and without the ith study included in the model, and it is given by

Cook's distance: It considers the change in the fitted values when the ith study is excluded, and it is given by
whereβ andβ (−i) are the vectors of parameter estimates for full data and after deleting the ith study, respectively, X is the k×(p+1) design matrix that includes one's in the first column (corresponding to β 0 ) and the values of the moderators in the other p columns, and W is a diagonal matrix The statistic D i follows chi-squared distribution with degree of freedom ( p+1) ,where p is the number of coefficients included in a mixed-effects model. If D i exceeds the value χ 2 p+1 at a desirable level of significance, then the ith study is a suspected outlier and should be further examined.

DFBETAS:
The change in the parameter estimates can also be measured as each study is deleted in turn.
Difference in coefficients betas (DFBETAS) is calculated using jj is the value of the (j + 1)th diagonal element of the matrix (X T W X) −1 . For small to medium data sets, a value of DF BET AS i larger than 1 could indicate an influential point. 4. COVRATIO: A change in the variance-covariance matrix of the overall effect size estimates can be quantified by where Var(β) = (X T W X) −1 . Removing the ith study will yield a more precise effect size estimate when COV RAT IO i is less than 1. 5. R i : The change in the estimate of the heterogeneity, it can be measured with the exclusion of each study, R i is calculated as a percent of change as follows If the ith study is an influential point, then its removal will cause a large decrease for heterogeneity (residual).

Penalized maximum likelihood estimation for outliers detection
In regression analysis, when the number of covariates is larger than the sample size, the maximum likelihood estimation is not working well. To overcome such a problem, one way is to reduce the number of explanatory variables looking for all possible subsets of these explanatory variables [17,3]. Variable selection can be carried out by imposing a penalty on the likelihood function, where, instead of maximizing where M is the objective function, λ is a scalar controls the tradeoff between the two parts, and P is a function that penalizes what one would consider less realistic values of the unknown parameters. Penalized Maximum Likelihood Estimation (PMLE) is widely used for stabilization of the estimators as well as detecting outliers. The performance of (PMLE) depends on the value of the tuning parameter. One way of selecting the tuning parameter, λ is the cross validation (CV), which uses the leave-oneout method [19]. Here, we propose an alternative efficient method to identify the outliers based on the penalized maximum likelihood estimation.

Model formulation
Suppose y = (y 1 , y 2 , . . . , y k ) T is the vector of observed effect sizes and x = (x 1i , x 2i , . . . , x ki ) T is the vector of moderates for i = 1, 2, . . . , p. Suppose β is the unknown meta-regression parameters, ε = (ε 1 , ε 2 , . . . , ε k ) T is the vector of errors (homogeneous and heterogeneous errors), and τ 2 is the variance between studies so γ (k×1) incorporate into meta-regression to indicate whether the ith observation is an outlier (the non-zero components of γ will correspond to the outliers). The interest here is to estimate β with (p + 1) parameters, γ with k parameters, and τ 2 , so the number of the unknown parameters is k + p + 2 which is greater than the sample size k. Therefore, a possible approach of this problem is by introducing a penalty, instead of maximizing ℓ ( β, γ, τ 2 ) , we maximize the following function For simplicity, we will work with the case of one moderator (i.e., p = 1), so we have to estimate k + 3 parameters. Consider the simple mixed-effect model where γ is unknown shifted parameter corporate to indicate whether the ith observation is an outlier or not. Since Y ∼ N (β 0 + β 1 X + γ, τ 2 + σ 2 i ), the log likelihood function will be given by Hence, the penalized log likelihood will be given by

Parameters estimation
The smoothly clipped absolute deviation (SCAD) penalty function and its first derivative, respectively, are given by where sgn (x) = 1 if x> 0, and -1 if x< 0 [15]. The SCAD function is used where it is a nonconvex continuous and differentiable on (−∞, 0)∪(0,∞), but not differentiable at 0. Its derivative vanishes outside the interval [−aλ, aλ] and it produces estimators with continuity, sparsity, and unbiasedness for the large coefficients [22]. Thus, using the SCAD penalty, the penalized log likelihood in (4), we get where λ k is used to emphasize the dependence of λ on k. Note that maximizing M λ is equivalent to the minimizing However, the minimizing procedure is nontrivial, we use an iterative algorithm based on β and γ. The algorithm we utilize, to obtain the estimator of γ for each iteration, is the coordinate descent algorithm. This algorithm is one of the most efficient algorithms in processing large-scale data due to its simple operation and fast convergence [22]. The procedure of coordinate descent algorithm proceeds as follows: Step 1: Set an initial estimateβ (0) , which is obtained by the mixed-effect-model ignoring possible outliers, and set m = 1.
Once the convergence is satisfied, we get the estimates of γ, and then we can judge the values of the coefficients for every study, whether they are outliers or not, (i.e., if the ith coefficient in γ is closed to 0 then the observation is not outlier, otherwise, it is a suspected outlier).
The estimation of model parameters including the estimation of tuning parameters as well as the implication of the coordinate descent algorithm has conducted by using R software, with partial use of ncvreg R package [10]. The proposed procedure can be extended straightforward to other meta-regression models with another settings including model type and the number of predictors.

Simulation study
The purpose of our simulations is to demonstrate the usefulness of our proposed procedure as well as its improved power performance compared to previous procedures. We conducted simulation studies to investigate the power of performance of the five outliers and influential diagnostic procedures, namely studentized deleted residuals, DFFITS values, Cook's distances, COVRATIO, and penalized maximum likelihood method [11].
The settings of the simulation study are summarized as follows: the study of specific effects y i is simulated from the random-effect meta-analysis with µ = 0.5, true effect β 1 = 0.5, β 2 = 1, and variance (σ 2 i +τ 2 ), i = 1, 2, . . . , k, where theσ 2 i are independently generated from a χ 2 1 distribution multiplied by 0.25 and then restricted to the interval (0.009, 0.6). Four values of the number of studies k = 10, 20, 30, and 50, and six values of τ 2 = 0, 0.2, 0.4, 0.6, 0.8 and 1. The contamination of the generated meta-data with an outlier at position d is presented as follow y * d = y d + ω, where y * d is the contaminated value of the effect y d at the d position and ω is the contamination level where 0 ≤ ω ≤ 1. The simulated data are obtained using the function rma.glmm() from the metafor R package [20]. The mixed-effect-model is fitted and then the five outlier detection procedures are obtained. The power of performance is investigated by calculating the percentage of correct detection of the contaminated study at position d.
The results of simulation study reveal that the performance of all considered statistics has a direct relationship with the level of contamination ω. Furthermore, the proposed PMLE method outperforms the other statistics, followed by COVRATIO as shown in Figure 1.  The power of performance of the PMLE procedure has a direct relationship with the sample size, while it has an inverse relationship with the value of τ 2 as shown in Figure 2 and Figure 3. Same behavior has been observed for other values of k and τ 2 .

Illustrative application
The authors of [6] conducted a meta-analysis of 48 school-based writing-to-learn programs, aimed to investigate the effectiveness of writing to learn interventions on academic achievement. Their results revealed that writing can has a small and positive impact on conventional measures of academic achievement. Recently, [1,21] considered the analysis of 26 studies as a subset of the 48 studies to illustrate their procedures of detecting outliers and influential points. Here, we consider the same dataset consisting of 26 studies, and 6 variables summarized as follows: Year: The years between 1980 and 1998 of conducting the study. College: A dummy variable indicates whether the sample consisted of high-school or college students. There are 17 (65.4%) of the studies were conducted in colleges and 9 (34.6%) of the studies were conducted in schools.
Length: The intervention in weeks between 1 and 24 weeks, with mean 9.62 weeks and standard deviation 6.88 weeks. Meta: A dummy variable indicates whether the intervention incorporated prompts for meta-cognitive reflection. There are 19 (73.1%) of the studies were the intervention incorporated prompts for meta-cognitive reflection and 7 (26.9%) of the studies were the intervention did not incorporated prompts for meta-cognitive reflection. The effectiveness, y i : It reflects the effect size of each study, where it was quantified in terms a continuous effect size of the standardized mean difference d i . With mean 0.30 and standard deviation 0.39 and ranging between -0.32 and 1.46. Figure 4 shows the forest plot of the effect size y i presented by black boxes at the center of 95% confidence intervals with whiskers. The weight of the black box indicates the precision, where the larger symbols indicate greater precision of the effect-size determined from its corresponding study.  The sampling variance, σ 2 i : The variance of the standardized mean difference. We fit the mixed-effects model for this data considering the effect-size with three explanatory variables named as "length of intervention" and the two dummy variables "college and meta-cognition". In order to determine the appropriate meta-regression model, the test of heterogeneity Q is conducted. The observed variation Q = 44.14 which exceeds χ 2 25,0.05 = 37.652. Furthermore, the expected variation (df ) equals number of studies minus 1, (df = 25). Thus, Q − df = 21.14 which is greater than 0. Thus, this is a significant indication of the existence of variations between the studies. Furthermore, the estimated variance of the observed effect-size isτ 2 = 0.0472. Thus, the mixed-effects model will be used to fit the data. Table 1 shows the coefficients estimates of the fitted model associated with its 95% confidence intervals. The estimates are not significantly differ from zero where the 95% confidence interval includes zeros. Therefore, we conclude that none of the estimated coefficients are significant. Figure 5 shows the residuals plots, where there are two points are inconsistent with other residuals, which are candidate to be outliers.

Mixed−Effects Model
Theoretical Quantiles Sample Quantiles  ( Figure 6), hence, these two studies can be considered as outliers. Figure 7 shows the plots of DFBETAS for four coefficients of the mixed-effect model. Intercept coefficient reveals the influence of studies with numbers 7, 15, 17, 19, and 25. The coefficient of length of interventions reveals the influence of studies with numbers 6,8,12,13,15,19,20,22,2,24, and 25. The coefficient of meta-cognition reveals the influence of studies with numbers 7, 22 and 25. The coefficient of college reveals the influence of studies with numbers 7, 8, 12, 13, 15, 19, 22, and 25. In order to apply the PMLE, data set need to be reconstructed via adding 26 dummy variables in matrix form such as S ij , where i, j = 1, 2, . . . , 26 and S ij = 1 for i = j and S ij = 0 otherwise. The SCAD penalty function is used by fixing the scalar a = 3.7 according to [22]. Figure 8 shows the cross-validation for different values of tuning parameter λ, it is obvious that the value of λ that minimizes the cross-validation error can be obtained by exp (−2.5) = 0.082 with four selected variables. For further illustration, Figure 9 shows the way how variables are entered the model one at the time and at any given value of λ, several coefficients are zero.
Using the optimum tuning parameter λ and fitting the meta-regression model we have the coefficient estimates as listed in Table 2. There are four non zero coefficients. These coefficients are corresponding to, length of intervention, S 7 , S 18 , and S 25 . Therefore, three studies with numbers 7, 18 and 25 are defined as outliers, where the highest magnitude belongs to study number 25.
The results of different outliers' detection procedures are consistent, and there are three studies with numbers 7, 18 and 25 are outliers. In order to determine the appropriate meta-regression model, the test of heterogeneity Q is conducted based on the reduced data set. The observed variation Q= 24.12 which is less than the value of     no significant indication of residual heterogeneity. The estimates of model coefficients for the reduced data are given in Table 3. The estimates show that meta-cognitive and college prompts variables become significant.

Conclusion
The problem of diagnostics in meta-analysis still need more investigation. The proposed procedure was introduced as an alternative to other available procedures considering the whole data set and it could identify multiple outliers in one run. Extensive simulation experiments demonstrate the usefulness of the proposed procedure as well as its improved power performance compared to previous procedures.