Generalizing the properties of the Finite Iterative Method for the Computation of the Covariance Matrix Implied by a Recursive Path Model

This paper generalizes the properties of the correlation matrix implied by a recursive path analysis model obtained using the Finite Iterative Method into the covariance case, where variables are no longer supposed to be standardized. We show that the implied covariance matrix computed using the Finite Iterative Method is affine with respect to each parameter of the considered model. Moreover, several other properties derive from this affinity and will be used to simplify the computation of the first as well as the second derivatives of the fit function used in the estimation of the model’s parameters. Finally, we illustrate the advantages of the proposed properties compared to the classical approximation used to compute the aforementioned derivatives through numerical example.


Introduction
Path Analysis Model (PAM) is a set of statistical methods used to evaluate causal relationships between variables measured on the same set of individuals [1][2][3]. PAM may be seen as an extension of the multiple regression models in the sense that variables can be both dependent and independent. This is not the case for multiple regression models, where only one single dependent variable is explained by several independent variables. On the other hand, PAM is a special case of Structural Equation Modelling (SEM) [1,2].
In the beginning, PAM was founded in the first half of the 20 th century by the geneticist and statistician Sewall Wright [4,5], and it has grown in popularity ever since. Since the 1970s, many articles presented an application of PAM in many fields, such as sociology [6], psychology [3], epidemiology [7], and others [1,8,9]. Figure 1 represents an example of PAM with two exogenous variables and two endogenous variables. In order to proceed with the PAM, five essential steps are carried out [2] : specification, identification, estimation, evaluation, and modification. The present paper focuses on the estimation step, which is the core of the whole process. It consists of finding a numerical value for each parameter by minimizing a fit function that measures the difference between two matrices : • The empirical covariance matrix denoted by S and obtained from data, • The covariance matrix implied by the model denoted by Σ(θ) and computed as a function of the parameters of the considered model.
Here θ is the vector of the parameters to be estimated. Two fit functions are widely used for this purpose, which are the Unweighted Least Square (F U LS ) and the Generalized Least Square (F GLS ) [10]. They are respectively defined as follows : However, other functions exist such that the Maximum Likelihood (F M L ) [10]. This paper considers the two function defined in (Eqs. 1) and (2). These two functions are represented in compact form as follows : That is F W LS = F U LS when K = I and F W LS = F GLS when K = S −1 . Given the complexity of the analyzed models, the explicit vector which minimizes the function defined in (3) cannot be found. In practice, the minimization task is performed through an optimization procedure. In this paper, the Newton-Raphson procedure will be considered. It is defined as follows [11] : where θ (s) , g (s) and H (s) are respectively the vector of the model's parameters, the gradient vector and the Hessian matrix at the s th iteration, s = 1, 2, . . .. The procedure starts by choosing arbitrarily the vector θ (1) . Then, new vectors θ (2) , θ (3) , . . . are successively generated following (Eq. 4). The procedure is iterated until the quantity ||g (s) || is smaller than a given threshold (usually ϵ = 10 −5 ). Here, the elements of g are the first derivatives of the F W LS defined in (Eq. 3) with respect to each parameter : From equations (Eqs. 3), (5) and (6), it is clear that the computation of Σ(θ) is fundamental. Usually, Σ(θ) is computed by the Jöreskog formula based on the so-called the reduced form [12,13]. Recently, El Hadri & Hanafi [14,15] introduced an alternative method called the Finite Iterative Method (FIM) to compute this matrix. Furthermore, El Hadri & al. [16] established that Σ(θ) computed by FIM is affine with respect to each path coefficient and when all variables are standardised. In the present paper, we generalize this affinity property when the variables are not supposed to be standardized. In addition, we prove that Σ(θ) computed by FIM is also affine with respect to the covariance between each pair of exogenous variables. The research paper is structured as follows : Section 2 introduces the notations used in PAM, defines the notion of the covariance matrix implied by a PAM, and recalls FIM algorithm for the computation of this matrix. Section 3 presents the properties of the implied covariance matrix that are generalizations in the correlation case. Section 4 illustrates, with an example, these properties and their advantages in terms of the computation of the derivatives of the fit function. Section 5 exposes a study using simulated data to highlight these advantages. Finally, we conclude with a summary and some perspectives.

The covariance matrix implied by the model
In the present section, we begin by recalling the basic notations and vocabulary used in PAM. Then, we define the vector of parameters and the notion of the covariance matrix implied by a PAM.

Notations and vocabulary
The basic notations used in PAM can be found in [1,2]. The PAM contains three types of variables : 1. Exogenous variable (ξ) : is a variable that is not influenced by other variables in the model [2,3].
2. Endogenous variable (η) : is a variable that is modified or determined by its relationships with other variables in the model [2]. Besides, the endogenous variables are explained by a combination of exogenous variables and unspecified influences captured by a disturbance.
3. The disturbance term (or the residual error term) (ζ) : is a variable that represents the part of the endogenous variable that is not explained by the model [2].
A PAM is represented algebraically by a system of a set of multiple regressions as follows : where q and p are respectively the number of exogenous and endogenous variables. System 7 can be formulated as the following compact form : where ξ, η, and ζ are respectively the vectors of the exogenous variables, endogenous variables, and the disturbances. More assumptions on these vectors are made (see [1,14]). In addition, Γ denotes the (p × q) matrix of the model's parameters linking endogenous variables to exogenous variables, and B denotes the (p × p) matrix of the model's parameters that relates the endogenous variables. Moreover, Φ = E[ξξ t ] denotes the (q × q) covariance matrix of exogenous variables. And finally, Ψ = E[ζζ t ] denotes the (p × p) covariance matrix of disturbances. The vector of parameters is defined as follows : Definition 1. The vector θ composed by the non null elements of the matrices Φ, Γ, B and Ψ is called the vector of parameters of the PAM : Definition 2. If the matrix B is strictly triangular lower [2], then PAM is called Recursive Path Analysis Model (RPAM).
The present paper is limited to a RPAM. In addition and for sake of simplicity, Σ(θ) is noted by Σ. This matrix is the ((p + q) × (p + q)) matrix, defined as follows : As aforementioned, Jöreskog [13,17,18] proposed a general formula to compute Σ based on the reduced form. It is a compact form expressing the vector of endogenous variables η as a function of the vector of exogenous variables ξ and the vector of disturbances ζ. Then the four blocks of Σ are computed separately.
Another way to compute Σ is the new method recently introduced by El Hadri and Hanafi [14,15] called Finite Iterative Method. It consists of filling in the blocks of Σ iteratively using an algorithm, whose number of iterations is equal to p. Comparisons between FIM and Jöreskog formula are given in [15]. Several months later, El Hadri & al [16] demonstrated several properties of the correlation matrix implied by a RPAM computed by FIM. Furthermore, Iaousse and El Hadri [18] applied this method to RPAM with correlated errors. Thereafter, El Hadri & Iaousse generalize FIM to compute the covariance matrix implied by a structural recursive model with latent variables [17]. In what follows, FIM Algorithm is presented.

Finite Iterative Method
El Hadri and Hanafi [15] have demonstrated that Σ can be computed iteratively. To make it clear, we note by A the (p × (q + p)) matrix of structural parameters defined as : FIM is defined by the following p iterations :

Algorithm 1 Finite Iterative Method (El Hadri and Hanafi 2015) [14]
Initialization: Σ 1:q,1:q = Φ Repeat for k = 1, ...p; FIM starts by setting Σ 1:q,1:q = Φ (covariance matrix among the q exogenous variables) where Σ 1:q,1:q is the sub-matrix of Σ obtained by extracting the first q rows and the first q columns. Thereafter, the sub-row Σ q+k,1:q+k−1 of the (q + k) th row of Σ containing the first (q + k − 1) elements is computed (step 1) as the product between (i) the sub-row A k,1:q+k−1 of the k th row of A containing the first (q + k − 1) elements, (ii) and the block Σ 1:q+k−1,1:q+k−1 where Σ 1:q+k−1,1:q+k−1 is the sub-matrix of Σ obtained by extracting the first (q + k − 1) rows and the first (q + k − 1) columns. In step 2, the sub-column Σ 1:q+k−1,q+k of the (q + k) th column of Σ is computed as being the transpose of the sub-row Σ q+k,1:q+k−1 . The (q + k) th diagonal element of Σ is setted to S q+k,q+k (step 3). Steps 1 to 3 are iterated p times over k. Remark 1. In Jöreskog's formula the diagonal elements of Σ are not fixed. In contrast, these elements are equal to the diagonal elements of S in FIM, (i.e., diag( Σ) = diag(S)).
Remark 2. Remark 1 implies that FIM does not consider the disturbance terms Ψ when computing Σ. In addition, this matrix is computed from Φ, Γ and B as follows [14] : where I is the identity matrix. Hence, the vector of parameters to be estimated using FIM is : According to remark 1, θ does not contain the diagonal elements of Φ. In addition, some elements of A are null.
In the following, section 3 introduces the basic properties of the covariance matrix implied by a RPAM.

Basic properties of the implied covariance matrix
As aforementioned in section 1, the covariance matrix implied by the model obtained using FIM disposes of some useful properties when all variables are supposed to be standardized [16] and Φ is supposed to be fixed. This section generalizes these properties for models with variables that are no longer supposed to be standardized and the off-diagonal of Φ are considered as free parameters. That is, theorem 1 below shows that Σ is affine with respect to each element of the vector θ defined in (Eq. 11).
In the sequel, let x and y be two distinct parameters in the model (i.e., the elements of θ defined in (Eq. 11), and α and β two real scalars.
• Σ(α) denotes the covariance matrix implied by the model computed by replacing x by α.
• Σ(α, β) denotes the covariance matrix implied by the model computed by replacing x and y by α and β respectively.
Theorem 1. Let x be a parameter of the model. The following equality holds : Proof of Theorem 1. The proof of theorem 1 is based on lemma 1 bellow : Lemma 1. The covariance matrix implied by the model Σ is affine with respect to each element of θ defined in (Eq. 11).
Proof of Lemma 1. See Appendix.
Corollary 1. Let x be a parameter of the model. The first derivative and the second derivative of Σ with respect to x are : and Proof of corollary 1. Direct consequence of theorem 1.
Corollary 2. Let x and y be two distinct parameters of the model. The following equality holds : Proof of corollary 2. Theorem 1 gives Since Σ(0) and Σ(1) are two covariance matrices implied by the considered model then theorem 1 gives and And the proof is achieved by inserting (15) and (16) in (Eqs. 12). ■ Corollary 3. Let x and y be two distinct parameters of the model. The following equality holds : Proof of corollary 3. Direct consequence of corollary 2.

Didactic example
In what follows, we illustrate the properties of Σ described in section 3. To do so, the model defined in figure  1 is considered. The associated system of structural equations is : It comes: Thus, the matrix A is : In addition, the covariance matrix among exogenous variables is :

Element of the gradient vector and element of the Hessian matrix associated with F U LS
In what follows, we propose to apply the previous formulas to compute the first and the second derivative of the Unweighted Least Square F U LS defined in (Eq. 1) wrt to ϕ 12 and ( ϕ 12 and γ 11 ) respectively. Using (Eq. 19), we get : As a consequence and inserting (Eqs. 19), (20) and (22) in ( Eq. 5), the first derivative of F U LS defined in (Eq. 1) wrt to ϕ 12 is : In addition and using corollary 1, As a consequence and inserting (Eqs. 19), (20), (21), (22) and (23) in ( Eq. 6), the second derivative of F U LS defined in (Eq. 1) wrt to ϕ 12 and γ 11 is :

Numerical Illustration
In the present section, we propose to compare the results of the present paper to one of the existing methods. Formally, we compare the gradient vectors and the Hessian matrices associated with the Generalized Least Square function defined in (Eq. 2) computed using two approaches. (i) the classical approach (CA) based on the NumDeriv package in R software which uses the Richardson method to compute the gradient vectors and the Hessian matrices [19]. And (ii) the proposal approach (PA) basing on corollaries 1 and 3 for the computation of the first and the second derivatives of Σ. To do so, the model given in figure 2 below is considered. It involves five of endogenous variables and three exogenous variables, with a total of 12 parameters (3 elements in Φ and 9 elements in A). Thereafter, 100 artificial datasets are generated using the rnorm function in R software. For each dataset, the gradient vector and the Hessian matrix of the F GLS are then computed using both CA and PA. In order to compare the two approaches, the following quantities are considered : where : • g (CA) and H (CA) are respectively the gradient vector and the Hessian matrix associated with F GLS computed by CA, • g (P A) and H (P A) are respectively the gradient vector and the Hessian matrix associated with F GLS computed by PA.
∆ g (respectively ∆ H ) is the distance induced by the Frobinius norm measuring the difference between the gradient vectors (respectively the Hessian matrices) obtained by the two approaches. The obtained values of ∆ g and ∆ H are depicted in figure 3 below.  shows that ∆ g does not exceed 8.08 × 10 −8 and has an average value equal to 3 × 10 −13 . Similarly, figure 3 (Right) shows that ∆ H does not exceed 8.08 × 10 −8 and has an average value equal to 2.27 × 10 −9 . Basing on this simulation, we can conclude that the two approaches are practically identical.

Summary and perspectives
The present paper proposes some relevant properties of the covariance matrix implied by a RPAM with non-standardized variables. These properties are important in the sense that they allow a simple calculation of the derivatives required for the optimization of the fit function to estimate the model's parameters. First, it may appears that these derivatives are simple to compute without these properties. However, the proposed properties are valid for every RPAM without knowing explicitly the model. That is, they will allow programming optimization methods without approximating neither the gradient vectors nor the Hessian matrices. As aforementioned, the methods are valid for the implied covariance matrices obtained using FIM algorithm 1. The first property given in lemma 1 is the fundamental characteristic asserting the affinity of the covariance matrix implied by a RPAM with respect to each parameter. The other properties give an expression of this matrix as a function of the model parameters, as well as the exact expression of the first and second derivatives of this matrix with respect to each parameter and with respect to each pair of parameters. We illustrated the advantages of the proposed properties using a theoretical example. We also simulated an illustrative example and compared the derivatives obtained using the proposal approach with a classical approach. This simulation confirms that these approaches are practically identical. These properties will enhance the development of the estimation methods in RPAM. More precisely, they make the optimization step easier, faster, and more flexible.
An R program is written to implement all the defined properties of the implied covariance matrix as functions. This program is available upon request.
Moreover, researches to prove or deny these properties for the covariance matrix implied by an exploratory factor analysis model [20] are in advance.
Meanwhile, it will be interesting to extend these properties to RPAM with correlated errors. This issue is also in advance.
As a perspective, we are currently working on extending these properties into the covariance matrix implied by a recursive structural equation model with latent variables [21].
Finally, researches on the development of FIM for non RPAM are still open.