GH Biplot in Reduced-Rank Regression Based on Partial Least Squares

One of the challenges facing statisticians is to provide tools to enable researchers to interpret and present their data and conclusions in ways easily understood by the scientific community. One of the tools available for this purpose is a multivariate graphical representation called reduced rank regression biplot. This biplot describes how to construct a graphical representation in nonsymmetric contexts such as approximations by least squares in multivariate linear regression models of reduced rank. However multicollinearity invalidates the interpretation of a regression coefficient as the conditional effect of a regressor, given the values of the other regressors, and hence makes biplots of regression coefficients useless. So it was, in the search to overcome this problem, Alvarez and Griffin [1], presented a procedure for coefficient estimation in a multivariate regression model of reduced rank in the presence of multicollinearity based on PLS (Partial Least Squares) and generalized singular value decomposition. Based on these same procedures, a biplot construction is now presented for a multivariate regression model of reduced rank in the presence of multicollinearity. This procedure, called PLSSVD GH biplot, provides a useful data analysis tool which allows the visual appraisal of the structure of the dependent and independent variables. This paper defines the procedure and shows several of its properties. It also provides an implementation of the routines in R and presents a real life application involving data from the FAO food database to illustrate the procedure and its properties.


Introduction
One of the challenges facing statisticians is to provide tools to enable researchers to interpret and present their data and conclusions in ways easily understood by the scientific community. Among the many tools now available for this purpose are multivariate graphical representations arising from factorial families such as principal component analysis, correspondence analysis and biplot analysis. These tools are highly appreciated and often used by researchers. The biplot, for example, was introduced originally by Gabriel in 1971 as a graphical representation of the rows and columns of a data matrix based on the singular value decomposition (SVD) [7], and is widely used having been incorporated in the majority of statistical packages. The biplot analysis approximates a data matrix Y by another matrix Y r = G r H ′ r where G r and H r are matrices of complete column rank obtained from the first r terms of its singular value decomposition. The row vectors of G r are called row markers, the row vectors of H r are called column markers and the elements of the original data matrix are approximated by the internal product of these markers. Gabriel called this factorization a biplot factorization. Three classical biplots are the GH biplot (column metric preserving biplot), the JK biplot(row metric preserving biplot) and the SQRT biplot (symmetric biplot). Examples of biplot applications can be found in many different areas inclding medicine ( [16], [10]) and 1. Is it possible to use the theory of partial least squares (PLS) to construct biplots in Reduced Rank Regression, If so, 2. What properties characterize this graphical representation, 3. Is it necessary to take the multicolliarity into account when constructing the biplot or is it possible to construct and interpret the biplot in the classical way.
We will now look at some preliminary facts which will help to provide the answers to these questions.

Preliminary ideas
As done by many authors, for example [3], [12], [6], [2] and [1], the study of multivariate regression in reduced Rank (RRR) can be considered a special case of the classic multivariate regression model: in which Y is a (n × q) matrix where each row of Y contains the values of the q dependent variables measured on n subjects and X is a (n × p) matrix where each row of X contains the values of the p independent variables measured on n subjects. We assume that X is fixed from sample to sample. Θ is a (p × q) matrix of parameters or regression coefficients and E is the (n × q) matrix of errors or random perturbations.
In total agreement with the ideas of Ter Braak and Looman [18], Alvarez and Griffin [1] proceeded to factor the coefficient matrix as Θ = AB ′ so that the multivariate reduced rank regression model became: The estimation of this model consists in using as an initial estimation forŶ that obtained by SVD as: where Q and V are orthogonal matrices of order (n × t) and (q × t) respectively with t = min(n, q), that contain the eigenvectors of (Ŷ Γ )(Ŷ Γ ) ′ and (Ŷ Γ ) ′ (Ŷ Γ ) respectively; D is a diagonal matrix of order q containing the eigenvalues of these matrices in non-increasing order (λ 1 ≥ λ 2 ≥ ... ≥ λ t ≥ 0); and Γ is a positive definite matrix. Up to this point, the steps of the estimation procedures proposed by [18] and [1] coincide. However when wanting to estimate A = (X ′ X) −1 X ′ Q, the inconvenience of multicollinearity in the variables X arises. Now that Q = XA, it is possible to make a comparison with the regression model with Q representing the matrix of dependent variables, X the matrix of independent variables and A the matrix of regression coefficients. This motivatedÁlvarez and Griffin [1] to apply Partial Least Squares Regression (PLSR) to estimate A. PLSR is a technique that generalizes and combines the characteristics of principal component analysis and multivariate regression. It is particularly useful when it is necessary to predict a set of dependent variables in terms of a set of independent variables in the presence of severe or strong multicollinearity in which case the classical multivariate regression procedures have serious stability problems. PLSR is a good alternative in such circumstances since it uses basic iterative concepts to adjust bilinear models in several blocks of variables giving a solution to multivariate statistics in the presence of many collinear variables in the data and a modest number of observations. The solution of each stage permits the construction of the following matrices (see [1] for details): where the subscript h indicates that the matrix comprises the sequence of the first h vectors.W spans the loading weights of the X-variables for a certain number of latent variables. Given [17], p.114), the two sets of weights w i andw i , i = 1, 2, · · · , h, span the same space. The matrix T h contains the scores of the X-variables and P h is called the Xloadings. The relations between the matrices determine the estimation PLS of the model (1) as follows: Phatak and De Jong in 1997 [15], found the following very useful expression for the regression coefficient matrix obtained by the PLS regression:Θ This finding is of utmost importance since, as they themselves state, the matrix W h (P ′ h W h ) −1 P ′ h is idempotent which implies that it is a projection matrix. Since it is not symmetric, it defines an oblique projection, which projectsΘ OLS in the space generated by the columns W h . If we now apply this theory to the estimation of A using partial least squares where Q = XA is the model to be estimated, then which permits us to see the row markers (Â P LS ) obtained from the PLS estimation as an oblique projection of the row markers obtained from a biplot in reduced rank regression using the OLS estimation (Â OLS ). In addition, 720 PLSSVD BIPLOT using this initial estimate, the matrix of regression coefficients is obtained by combining the PLS estimation of A with the estimation of B given by SVD so that the reduced rank regression coefficient matrix given by PLSSVD is: Finally the estimations of Y proposed by [1] are given by: These procedures described in [1] provide the basis for this article which describes how to construct a biplot in reduced-rank regression in the presence of multicolinearity. We focus on describing the properties of 8 and its application to an example using data from the FAO data base.
The rest of this article is divided in three parts. The first describes the construction of the GH biplot in Reduced-Rank Regression based on Partial Least Squares that we have named PLSSVD GH biplot. The following core section describes properties of this biplot and provides their proofs. A previous reading of the articles [18], [15] and [1] is important for understanding this section. In the last part, we provide a real life example using food data from the FAO data base and routines in R provided in an appendix to apply the techniques and illustrate the properties of this biplot.

Biplot in reduced-rank regression
The chief cornerstone for the construction of a biplot in reduced-rank regression in the presence of multicollinearity using PLS is found in the paper of Ter Braak and Looman [18]. They showed how to factor the regression coefficient matrix using the techniques of [8] with OLS and SVD to obtain the biplot in reduced rank regression (RRR). This procedure for the RRR biplot can be summarized as follows: 1. Centralize or standardize the matrices X (n × p) and Y (n × q). 2. Express the multivariate regression model of (1) as: with Γ a given weight matrix. The minimum for a rank r approximation is obtained by first applying the decomposition in singular values: and the minimum is then obtained by settinĝ It is important to point out that the decomposition in singular values in 2 and 10 differ only in the matrices Q and U (see [18], p.991). 4. Finally, the estimation of the regression coefficient matrix is: Remember that Γ is a given weight matrix. The results of the estimation procedure is affected by this weight matrix whose importance is amply explained in [1], [19] and [18]. In this paper Γ = I will be used.

Properties of the GH BIPLOT in Reduced-Rank Regression based on PLS (PLSSVD GH Biplot)
As we previously established, a biplot for the reduced rank regression coefficients begins by minimizing Stat., Optim. Inf. Comput. Vol. 9, September 2021 WILLINÁLVAREZ AND V. GRIFFIN 721 assume that M = X ′ X, then: In the introduction we formulated some questions regarding the biplot in reduced-rank regression based on PLS in relation to its properties and interpretation. Two additional pertinent questions are 1. How is the geometry of PLS involved in the properties and 2. How simple or complicated is the mathematics needed to prove the properties One of the reasons for writing this paper is to show the simplicity of the properties and how to apply this very useful data analysis tool. We now show how to use the property (12) and the geometric properties of PLS to establish the properties of the PLSSVD GH Biplot.

Properties of the column markers of the PLSSVD GH Biplot
In essence: Associating conveniently: Since T h = XW h we can substitute this in the previous expression to obtain: Associating and transposing: We also know that X = T h P ′ h , so that: property 2. The scalar product of the columns ofΘ P LS−SV D using the metric M approximates the scalar product of the column markers. ▹ In effect: Associating conveniently and applying property 1: In effect: property 3. When the variables are centered, the squared length of the column markers approximates the sample covariance between the dependent variables. ▹ In effect: Let S be the sample covariance matrix andŶ P LS−SV D centered at its mean so that: By corollary 1: From the equation 10 we know that: from which: From corollary 1 and the equations (11) and (12), we have: From this we derive that the sample variance of a dependent variable can be expressed as: and we can see that the standard deviation of any variable can be estimated by the associated column marker. This can be appreciated in the expression: property 4. The cosine of the angle between two column markers approximates the correlation between the dependant variables centered at their mean. ▹

723
In effect: Let b j and b k be two distinct column markers and α the angle between them. By definition: The variables are centered at their mean so that by applying property 2 we have: where s j k is the sample covariance of the variablesŷ j andŷ k , s j and s k are the standard deviations ofŷ j andŷ k , respectively.
property 5. The quality of the column representation (QCR) under the metric M = X ′ X and the weight matrix Γ = I can be measured by We know that the sample covariance matrix S is given by: By hypothesis Γ −1 = I, so that; B SV D B ′ SV D = V D 2 V ′ which means that the total variability can be expressed as: Before proving this property we should take into account the property 6, where it was established that if

Properties of the Row Markers
Taking into account that V is orthonormal, we have that V ′ V = I r , and so: Additionally, since DD −2 D = I r : In addition, the PLSSVD regression coefficients of the matrixΘ P LS−SV D = A P LS B ′ SV D , following the contributions of Gabriel [8], are approximated by the internal product of the row and columns markers using the expression:θ where α is the angle between ⃗ a P LSi (i-th row marker) and ⃗ b SV Dj (j-th column marker), sign is equal to 1 o -1 depending if the angle is acute or obtuse, respectively. We also have adopted the notation (⃗ a P LSi ) j * to represent the orthogonal projection of the row marker ⃗ a P LSi on the column marker ⃗ b SV Dj . If the rank ofΘ P LS−SV D is r = 2, the biplot is defined as the graphical representation in a plane of the row markers ⃗ a P LSi = ⃗ a i = (a i1 , a i2 ) ′ and the columns markers ⃗ b SV Dj = ⃗ b j = (b j1 , b j2 ) ′ , both vectors having two elements, such that for each i and each j:θ Considering the above, to approximate by means of a biplotθ 13 , a hypothetical PLSSVD regression matrix of order (4 × 3) and rank 2, we should setθ where ⃗ a i = (a 11 , a 12 ) ′ and ⃗ b j = (b 31 , b 32 ) ′ .
In Figure 1 it is seen howθ ij can be interpreted in the plane for the hypothetical PLSSVD regression coefficient matrix of order (4 × 3) and rank 2. The figure (1.a), shows how each row marker ⃗ a P LSi = ⃗ a i should be orthogonally projected on the column marker ⃗ b SV Dj = ⃗ b j ; while in the Figure (1.b) one can visualize in the plane, the biplot of the PLSSVD regression coefficient matrix, its row and column markers as well as the projections of the row markers on the column marker ⃗ b 1 . On the other hand, the figure (1.c) shows the ordering of the projections (a i ) j * (in the graph bold letters represent vectors, that is, ⃗ a = a) on the column marker b 1 , so that: sign∥(a 1 ) 1 * ∥ ≥ sign∥(a 3 ) 1 * ∥ ≥ sign∥(a 2 ) 1 * ∥ ≥ sign∥(a 4 ) 1 * ∥. From this information and taking into account that ∥ ⃗ b 1 ∥ is the same for the approximation of each element of the first column of the (4 × 3) regression coefficient matrix, it is possible to approximate the ordering of the regression coefficients of the first columns of the matrix as: θ 11 >θ 31 >θ 21 >θ 41 . For more detail we refer the reader to [4](p. 145).

The Algorithm
In the appendix we have included the pls-svd algorithm which is based on the ideas of [17], [11], [21] and [1]. The algorithm begins by assigning the two sets under study: the set of dependent variables Y , and the set of independent variables X. One of the requirements of the algorithm is that the columns of X and Y be standardized with mean zero and variance 1. To perform this, the function standardize() was implemented in R. The standardizations of Y and X are called F o and Eo respectively. In [1], we established that if Γ = I, then B SV D = V D. The initial estimation of AX = Q uses Q obtained from the decomposition in singular values of F o using Q = L$u decomposition. It is important to remember that the method described in this article is useful when the set of independent variables X has "Severe multicollinearity". In this study the diagnostic of collinearity is performed by the eigensystem X ′ X proposed by [14] using the function CollinDiag(X) to calculate the condition number of X ′ X. When this number is less than 100, there does not exist a serious problem of multicollinearity and the routine outputs the message "there isn't multicollinearity". If, on the other hand, the condition number is between 100 and 1000, the multicollinearity is moderate to strong so the routine outputs the message "Moderate to strong multicollinearity". Lastly, if the condition number is greater than 1000, the multicollinearity is strong and to avoid confusion, the routine outputs the message "Severe multicollinearity".
The implementation used here is slightly different from the algorithm presented in [1] regarding the calculation of the vectors w h and r h . Using the suggestions of [21] we here begin with the autoequations: After executing the algorithm pls-svd, we calculate the estimation of the PLSSVD regression coefficient matrix (Θ P LS−SV D ) asΘ P LS−SV D = Apls% * %t(B) and with the obtained elements we proceed to declare the labels of the biplot using the instructions biplot(Apls,B, var.axes = TRUE, xlab = "Comp1",ylab = "Comp2", xlabs =label1, ylabs = label2, main = "BIPLOT PLS-SVD");abline(h=0,v=0)

Illustration using Food composition database for biodiversity
The FAO is an agency of the United Nations which designs strategies to fight hunger. The principal objective of this agency is to assure food supply for all peoples so that access to sufficient good quality food is guaranteed on a regular basis in order to lead an active and healthy life. This organization has more than 194 members in more than 130 countries. Among its different functions is that of providing data to the nations to help them achieve their own specific objectives. In this section we wish to apply the previously described biplot to FAO data. Specifically we use the Food composition database for biodiversity, BioFoodComp 4.0, which can be accessed by the link http://www.fao.org/infoods/infoods/tables-and-databases/faoinfoodsdatabases. The FAO states in its own words, "The Biodiversity Database is a global repository of analytical data on food biodiversity of acceptable data quality". For our analysis we have selected fruit from the Biodiversity Database BioFoodComp 4.0 with the objective to construct a graphical representation biplot to explain the content of Vitamins and Carotenoids (dependent variables Y ) of two fruit groups, banana and papaya, by means of mineral content and trace elements (independent variables X). For that purpose we included rows 16 through 36 of that original Excel database. The resulting dependent variables to be explained are shown in Table 1; while the explaining (independent) variables are shown in Table 2. In Table 3 is the correlation matrix for the data. We can there appreciate the correlations between the dependent variables (R yy ), those of the independent variables (R xx ), as well as the correlations between the dependent and independent variables (R xy ). The dependent variables made up of Vitamins and Carotenoids, show high positive correlations between vitamin A retinol activity equivalent (VITA.RAE) amd the beta-carotene (CARTB). These have moderate correlation with lutein (LUTN) and low correlation with vitamin C (VITC). The lutein (LUTN) amd vitamin C (VITC) present low correlation between themselves. On the other hand, in the set of independent variables ( Minerals and trace elements), the amount of water (WATER) has high correlations with almost all of the other independent variables except boron (B), Water (WATER) shows high positive correlation with calcium (CA) and high negative correlations with the rest. Boron (B) has low to moderate correlation with the rest of the independent variables. We also can see that the other independent variables have a high correlation with at least one of the remaining variables which is indicative of the existence of multicollinearity in this set of variables. Regarding the correlations between Minerals and trace elements with Vitamins and Carotenoids, we see that vitamin A retinol activity equivalent (VITA.RAE) and the beta-carotene (CARTB) are the variables that present highest correlation; the lutein (LUTN) has almost all moderate correlations and vitamin C (VITC) presents highest correlation with zink (ZN) and low to moderate relations with the rest.
To explain the relation between the variables we use the model described in (1). We begin with the diagnosis of multicollinearity using the function ColinDiag(X) described previously and provided in the appendix. The independent variables X present Severe multicollinearity, which means that the classic RRR biplot ( [18]) cannot be applied to this data.  variability and the second component 24.81%, which means that the first factorial plane has the capacity to represent 93.51% of the total. Remember that Property 4 shows that the correlation between two dependent variables is approximated by the cosine of the angle that exists between the respective column markers of the variables so we see that the column markers of the variables VITA.RAE and CARTB are almost coincident so that the relation between the two variables is close to unity. Verifying in Table 3 we see that the correlation between these two variables is 0.99. Additionally. in Figure 2, we can see that the markers of these two variables are closer to the variable LUTN than to VITC, which shows that the relations of these two variables with LUTN are greater than with VITC. We can also observe that the markers for LUTN and VITC manifest moderate relations between their respective variables.
The most important objective of this biplot is to provide a graphical representation ofΘ P LS−SV D , the regression coefficient matrix which can be seen in Table 4. To highlight the usefulness of this graphical representation we have constructed Figure 3 which has a slight change from Figure 2: the biplot axis corresponding to the variable VITC has been lengthened and painted in blue.
As a second step, each of the row markers corresponding to the Minerals and trace elements have been projected orthogonally on this column marker. Note that on this blue axis an order to the projections has been established. The order is This order is very close, as can be noted, to the order found in Table 4, which shows that we can give an approximation of the multiple regression model of the variable VITC with respect to the predictor variables. Lastly, if we perform a similar procedure for the other column markers we will note that the biplot provides a good approximation to the other multiple regression models.

Conclusions
We have shown how to prepare a biplot of reduced rank regression when the model shows multicollinearity. In the example it is evident that this occurs and that the use of partial least squares provides a good alternative when we desire a representation of the coefficient matrix. We have seen that generalized singular value decomposition and the use of a weight matrix both play a fundamental role in achieving this goal. Thanks to the geometry of PLS, we were able to show that the row markers in the PLSSVD GH Biplot are an oblique projection of the row markers of the RRR biplot.
A summary of the properties derived using this methodology and described in this article is as follows.
1. In the PLSSVD GH Biplot under the metric M = X ′ X we have shown that A ′ P LS M A P LS = I r 2. The scalar product of two columns ofΘ P LS−SV D under the metric M approximates the scalar product of the corresponding column markers. 3. The scalar product of two columns ofŶ P LS−SV D approximates the scalar product of the corresponding columns ofΘ P LS−SV D under the metric M . 4. When the variables are centered, the squared length of the column markers approximate the sample covariance between the dependent variables. 5. The cosine of the angle between two column markers approximates the correlation between the dependent variables centered on their mean. 6. The quality of the representation of the columns (QRC), under the metric M = X ′ X and weight matrix Γ = I can be measured by 7. The scalar product of two rows ofΘ P LS−SV D under the metric M = X ′ X and weight matrix Γ = I r approximates the scalar product of the corresponding row markers.
All these properties are subject to the given weight matrix Γ = I and even for these weights we are sure that there remain many more properties to state and prove. It is important to say that research remains open for other weight matrices such as, for example, Γ = R −1/2 yy . This study has been based only on the GH biplot (Column Metric Preserving biplot) so that research remains to be done for the other classical biplots including the JK biplot and the SQRT Biplot. Another important aspect to be emphasized is that this research was done in response to the question in [18] and repeated in [1]... Why can one expect PLS regression methods to perform better than multiple linear regression, ridge regression and other well known regression techniques? In spite of this, we believe that there exist further opportunities for research in this area. For example, investigate the use of the ridge regression technique instead of PLS, comparing the biplots obtained through the different techniques. Many authors coincide that in order to improve the approximations of the biplots, scaling should be done. Scaling was not performed in this article so we invite other researchers to dive deeper in this and other biplot themes.