Non-parametric Multivariate Kernel Regression Estimation to Describe Cognitive Processes and Mental Representations

In this research paper, we set forward a non-parametric multivariate recursive kernel regression estimator under missing data using the propensity score approach in order to describe writing word production. Our main objective is to explore cognitive processes and mental representations mobilized when a human being prepares to write a word according to the idea developed in [21]. We investigate the asymptotic properties of the proposed recursive estimator and compare them to the well known Nadaraya-Watson’s regression estimator. We calculate the bias and the variance of the proposed estimator which depend on the choice of some parameters such as the stepsize and the bandwidth. We examine some data-driven procedures to select these parameters. Thus, we demonstrate that, under some optimal choices of these parameters, the MSE (Mean Squared Error) of the proposed estimator can be smaller than the one obtained by using Nadaraya Watson’s regression estimator. The elaborated estimator is then applied to the behavioral data to classify some participants in groups. This classiﬁcation may stand for a departure point to tackle written behavior variations.


Introduction
Research on the handwritten word production aims to describe the cognitive processes and mental representations mobilized when a human being prepares to hand-write a word from an idea (see [21]). The most frequently used method to explore this issue relies on relating a behavioral variable, reaction time and a set of factors aiming at predicting different cognitive treatments (e.g., [3], [20]). It is possible to imagine some variations in the cognitive treatments performed by participants. This could result in variations in the relationship between the behavioral variables and the explanatory factors. The intrinsic target lies in being able to group participants with similar degrees of variation. In order to achieve our purpose, we resort to regression analysis, which corresponds to the study of how a response variable depends on one or more predictors. In fact, it is a reliable method for identifying which variables have impact on a topic of interest. The process of performing a regression allows us to confidently determine which ψ i = P[δ i = 1|T i ], ∀i ∈ {1, . . . , n}.
In the remainder, Y is considered as the response variable of interest and X its associated regressor vector variable. Our basic purpose in this paper is to propose a recursive estimator to estimate recursively the regression function p (x) = E [T |X = x] under censoring data. Our aim then resides in building up a stochastic algorithm, which approaches the regression function m : yg(x, y)dy at a given vector x. For this reason, we define an algorithm of search of the zero function ϕ : y −→ m (x) − y. We therefore proceed as follows: we fix m 0 (x) ∈ R, and then we set for all n ≥ 1, m n (x) = m n−1 (x) + β n U n (x), where (β n ) is a positive sequence of real numbers decreasing towards zero and U n (x) is an observation of the function ϕ at the point m n−1 (x). In order to construct U n (x), we adopt the approach considered first by [23], [34] and more recently by [31] and we introduce a multivariate kernel K, which is a function satisfying R d K(t)dt = 1, and a bandwidth (h n ), which is a sequence of positive real numbers that tends to zero. By assuming U n (x) = Y n ψ −1 n h −d n K x − X n h n − m n−1 (x), the stochastic approximation algorithm that we consider to estimate recursively the regression function m at a vector x can be expressed as follows : Throughout this paper, we consider that m 0 (x) = 0 and we set Q n = n j=1 (1 − β j ). It follows that Moreover, we use the recursive multivariate probability density estimator of the density function f defined in [15] which was constructed with the same tools of stochastic approximation algorithm and under the condition that f 0 (x) = 0, it follows that In this paper, we consider the following recursive estimator of the regression function p : We explore the asymptotic properties of our proposed multivariate recursive kernel regression estimator. Afterwards, we compare our proposed estimator to the multivariate non-recursive Nadaraya-Watson's regression estimator (see [18] and [36]) p n indicated by

Notations and assumptions
Throughout this paper, we invest the following useful notations: Before stating our assumptions, let us recall the definition of class of regularly varying sequences introduced by Galambos and Seneta in [7].

Definition 1
Let (v n ) n≥1 be a non-random positive sequence and γ ∈ R. We state that

PSM TO DESCRIBE COGNITIVE PROCESSES AND MENTAL REPRESENTATIONS
In what follows, we exhibit a lemma that will be widely used for the study of our estimator p n . The proof of this lemma was introduced in [15].
(iii) g(s, t) is twice continuously differentiable with respect to s.
bounded function continuous at s = x.

Main results
In order to investigate the asymptotic properties of our estimator p n , we first need to introduce the following two propositions which provide the bias and the variance of m n as well as those of f n .

Proof
Throughout this proof, we use the following notations:

PSM TO DESCRIBE COGNITIVE PROCESSES AND MENTAL REPRESENTATIONS
We have It follows that, Moreover, we have Since we have A Taylor expansion of m around x ensures that Owing to the fact that m (2) ij is bounded and continuous at x for all i, j ∈ {1, . . . , d}, we thus get For the case a ≤ β/(d + 4), we have lim n→∞ (nβ n ) > 2a and then 1 − 2aξ β > 0. The application of lemma 1 ensures that We infer that For the case a > β/(d + 4), we have lim Hence, the application of lemma 1 ensures that As a matter of fact, the result can be expressed as Let us now compute the variance of m n (x). We state

PSM TO DESCRIBE COGNITIVE PROCESSES AND MENTAL REPRESENTATIONS
Moreover, we have Hence, the Taylor's expansion for h : Thus, Therefore, the result is provided by For the case a < β/(d + 4), we have lim Then, the application of lemma 1 ensures that Our main result rests on the following theorem, which provides us the bias and the variance of p n .

Theorem 1
Let assumptions (A 1 ) − (A 3 ) hold and assume that, for all i, j ∈ {1, . . . , d}, m (2) ij and f (2) ij are continuous at x, we obtain If a ∈ β d+4 , 1 , then The bias and the variance of the estimator p n defined by the stochastic approximation algorithm (4) then heavily depend on the choice of the stepsizes (β n ).

Proof
For this proof, let us note that for f n = 0, we have with It follows from (19) that the asymptotic behavior of p n (x) − p(x) can be deduced from the one of A n (x). Hence, we can state Since we already have the bias of m n (x) as well as that of f n (x), and considering the fact that m (x) = p(x)f (x), then we just need to combine the results (10), (11), (6) and (7) in order to obtain (15) and (16). Now, we have Let us now compute the covariance between m n (x) and f n (x). Indeed, we have Consequently, (17) and (18) follow from the combination of (12), (13), (8), (9) and (21). For the case a ≥ β/(d + 4), we can deduce Now, let us state the following theorem which yields the asymptotic normality of the proposed multivariate recursive regression estimator under missing data p n denoted in (4).

Theorem 2
Let assumptions (A 1 ) − (A 3 ) hold and assume that, for all i, j ∈ {1, . . . , d}, m (2) ij and f (2) ij are continuous at x. We therefore have: with represents convergence in distribution and N denotes the Gaussian distribution.

Proof
We have On the one hand, it's obvious that On the other hand, we attempt to apply Lyapunov's theorem for S k (x). For this reason, we consider

Moreover, we have
Hence, by applying lemma 1 , it can be inferred that In addition, we have Therefore, Hence, We then deduce that In the following, let us assume that there is p > 0, such that By applying lemma 1 , we obtain Then, it follows that 1 Moreover, since we have by applying the Lyapunov theorem, we get Moreover, (23) ensures that Then, the combination of (24) and (25) ensures that Hence, the application of Lyapounov's Theorem coupled with the combination of (15), (16) and (26) ensures the convergence in (22).
In order to measure the asymptotic performance of the proposed recursive kernel regression estimator under missing data p n and to be able to use a data-driven bandwidth selection procedure, through proposing an asymptotic unbiased estimators of the unknown quantities, we consider the Mean Weighted Integrated Squared Error (M W ISE), where the weight function is selected to be equal to f 3 (x). This choice was motivated by the fact that we can propose an asymptotic unbiased kernel estimator for the unknown quantities, which will appear in the M W ISE as reported previously in [29], and which shall be detailed later.

Asymptotic expressions of M W ISE of p n
The M W ISE of the estimator p n is determined by, For simplicity, we set

It follows that
The corollary bellow ensures that the bandwidth which minimizes the M W ISE of p n depends on the choice of the stepsizes (β n ) and then the corresponding M W ISE depends in turn on (β n ).

Corollary 1 Let assumptions
To minimize the M W ISE of p n , the bandwidth (h n ) must be equal to

Then, the corresponding M W ISE is estimated in terms of
The following corollary is presented in the special case, where (β n ) is chosen as (β n ) = (β 0 n −1 ). We can check easily that the optimal choice of β 0 is obtained by getting β 0 equal to 1.

Corollary 2
Let assumptions (A 1 ) − (A 3 ) hold. To minimize the M W ISE of p n , we must choose the stepsize (β n ) in GS(−1) such that lim n→∞ (nβ n ) = 1. Consequently, the optimal bandwidth (h n ) must be equal to Thus, the corresponding M W ISE is provided by

Asymptotic properties of p n
The main properties of the generalized non-recursive regression function estimator p n are displayed in the following proposition.

Proposition 3
Let assumptions (A 1 ) and (A 3 ) hold and assume that, for all i, j ∈ {1, . . . , d}, m (2) ij and f (2) ij are continuous at x. Therefore, the bias and variance of Nadaraya-Watson's regression estimator are equal to:

Corollary 3
Let assumptions (A 1 ) and (A 3 ) hold. To minimize the M W ISE of p n , the bandwidth (h n ) must be equal to d 1 d+4 Then, the corresponding M W ISE is specified by Clearly, the use of such bandwidth (29), is not possible when we use real data. From this perspective, the next section is devoted to build up a data-driven bandwidth procedure, which will be helpful in practice.

Bandwidth selection
Within the framework of non-parametric kernel estimation, the choice of the smoothing parameter is crucial for the effective performance of the estimators. There are a myriad of data-driven bandwidth selection methods recorded in literature which can be divided into three broad classes: cross-validation techniques, plug-in methods, and the bootstrap approach. A detailed comparison of the three techniques is exhibited in [6]. In this paper, based on the previous work conducted by [27][28][29] for unidimensional data, we propose a second generation Plug-in bandwidth data-driven procedures in the multivariate data for regression estimation.

Plug-in bandwidth selection method
A widely used criterion stands for selecting a bandwidth that minimizes the estimate of the mean squared error, using the density function as a weight function. [2] proposed an efficient method of bandwidth selection, a plug-in estimate. Since the M W ISE depends on unknown quantities I j , j = 1 . . . 5, we suggest elaborating an asymptotic unbiased estimator of those quantities. As a matter of fact, we adopt the approach proposed in [2], called the second generation Plug-in estimation. For this purpose, we should introduce the so called pilot bandwidth (b n ) n≥1 ∈ GS(−δ), δ ∈ (0, 1) .
In practice, we take b n = n −δ min s, , with, s is the sample standard deviation and Q 1 , Q 3 are the first and third quartiles. In order to select the parameter δ, we follow the work of [27][28][29].
First of all, for the sake of simplicity, the kernel K is considered as a product of univariate kernels K satisfying 1 d+4 Stat., Optim. Inf. Comput. Vol. 10, September 2022 1038 PSM TO DESCRIBE COGNITIVE PROCESSES AND MENTAL REPRESENTATIONS Estimation of I 4 and I 5 : We consider the following kernel estimators to estimate respectively I 4 and I 5 : where, K b is a kernel and b n is the associated bandwidth, such that δ = −2/5. Hence, the plug-in bandwidth selection estimator of (29) is indicated by with, It follows that, the plug-in non-recursive estimator of M W ISE[p n ] is equal to

Application to the handwritten word production
Research on the handwritten word production aims to describe the cognitive processes and mental representations mobilized when a human being prepares to handwrite a word from an idea of [21]. One of the most widely used tasks to experimentally explore these issues is object naming. Participants have to produce words corresponding to the names of a set of drawings in handwriting as quickly as possible. It is generally accepted that the handwritten objects naming involves four levels of processing [19]. First, a perceptual analysis of the visual input is performed, which results in activation of stored structural knowledge about the object. A second processing level corresponds to the retrieval of semantic/conceptual information. The lexical selection level makes orthographic word form information available. Eventually, the motoric programming level allows the access to motoric codes corresponding to each produced letter.
These theoretical propositions concerning the cognitive processes and representations involved in the handwritten object naming stem from studies aiming at finding predictors of reaction times (RTs hereafter), i.e., the time between the presentation of the image and the first graphic movement (e.g., [3]; [20]). Four factors have been reported to significantly influence RTs, each of which allows indexing a specific processing level. Image Agreement (IA) captures the similarity between structural representations stored in memory and the visual characteristics of an object's drawing. This factor has extensive influence in terms of the perceptual analysis. The IA is measured on a Like rt scale, generally in five points, from '1 -weakly similar' to '5 -strongly similar'. A negative linear relationship is observed between this variable and the RTs (see [3]; [20]). Image variability (Ivar or Image ability) is designed to index the 'richness' of semantic representations. Like AI, it is rated on a 5-point scale, from 1 = few images to 5 = many images. A negative linear relationship is reported between handwritten RTs and Ivar (see [3]; [20]).
Name agreement (NA) refers to the degrees of agreement on the use of a specific label for an image, measured using an entropy measure (h-index). A positive linear relationship is reported between RTs and the h-index (see [3]; [20]). NA indexes the influence of the number of correct alternative names existing for an image (e.g., couch => sofa). Latencies would be more or less impacted by the time needed to manage the competition between the higher or lower number of alternatives during lexical access. Finally, the influence of age-limited learning (Age of Acquisition, AoA) has been systematically emphasized in studies on the predictors of handwritten RTs (see [3]; [20]). AoA is usually measured using a Like rt scale (from 1 = learned at 0-3 years to 5 = learned at age +12, with 3-year bands in between), with a population of young adults who are asked to estimate the age at which they learned the proposed word. A positive linear relationship is observed between the RTs and the rated values of AoA (see [3]; [20]). Experimental work [22] suggests that this variable influences the orthographic wordform encoding processes. The major target of this work is to classify the participants in groups of clusters. From this perspective, we first have to predict the regression function, i.e the relation between the variable T = RT s and the four covariates X 1 = H, X 2 = IA, X 3 = Ivar and X 4 = AoA. Since the response variable RT s is subject of missing data, we should introduce a correction variable Y := CRT s defined as Y i = T i * 1 {Ti is observed} . Here, we have N p individual estimators of each participant Y 1 , . . . , Y N p ( Y 1 , . . . , Y N p ) and a general estimator Y g ( Y g ) which estimates the whole database of N p participants. It's worth noting that, for each participant/covariate behavior test, we invested a different method for bandwidth selection, namely the plug-in univariate selection for multivariate data. This implies that, instead of opting for a single value of bandwidth h n , we considered a vector h n1 , . . . , h nd , an individual choice of bandwidth for each covariate. Then, for the recursive case, we have a matrix of bandwidths: We denote by p * i the reference regression vector and by p i the test regression. Thus, we calculate the two following measures: the Mean Squared Error: Algorithm 1 X 1 , . . . , X 4 are the covariates such that X 1 = H, X 2 = IA, X 3 = Ivar and X 4 = AoA, Y is the response variable with Y = CRT s, K is the Gaussian kernel n the number of items and N p is the number of participants. Input: Y , X 1 , . . . , X 4 , K, n and N p.
Let us underline that in order to classify participants in groups, we use the M SE as a reference vector. Thus, we use the k-means method to specify the maximum number of needed clusters.
for the nonrecursive estimation). 4: end for 5: A classification of the remote distance // through kmeans package in R. output: The classification list using both considered estimators.

Result Analysis:
Departing form figure 2 and table 1, we deduce that the proposed recursive estimator outperforms the nonrecursive one in terms of mean relative error estimation. Meanwhile, figures 1 and 3 indicate that it is advisable to consider three clusters. As far as written production behavior is concerned, this implies that the classification procedure suggests three clusters to measure the distance of each participant from the reference. In other words, three forms of variation can be observed when participants have to write the label of a drawing. Further exploration of the available characteristics of the participants suggests that such anthropological factors as the age and gender do not account for the result of clustering. Descriptive analysis of executive function task data suggests that there are differences between the three groups of participants. This indicates that the variations would be interpreted in part by the participants' cognitive processing ability and by differences in the mobilization of participants' executive functions. Studies based upon procedures for fitting reaction time distributions with ex-Gaussian-type probability density distributions (convolution of a normal and exponential law) have corroborated the role of these executive functions in simple tasks (e.g., [26]; [35]). Our analyses yield that this result can be extended to more complex activities such as written production. Eventually, this work confirms the significance of the use of non-parametric regressions for modeling behavior in experimental psychology area.

Conclusion
In this research paper, we elaborated a multivariate recursive regression estimator under missing data. We first investigated the asymptotic properties of the proposed estimator by providing the bias as well as the variance in order to demonstrate that our estimator asymptotically follows a normal distribution. Subsequently, we compared our recursive estimator with the non-recursive multivariate Nadaraya-Watson's regression estimator using the plugin bandwidth selection approach. In our application of real dataset, and for all the cases, the proposed estimator (4) with stepsize (β n ) = n −1 yielded smaller M SE and M RE compared to the non-recursive Nadaraya Watson's estimator. As part of the application, it was possible to estimate the response variable RTs (Reaction Times) according to the other covariates through classifying the participants into clusters of membership according to their approximation to the real value of RTs. To conclude, the use of the multivariate recursive kernel regression estimator under missing data enabled us to obtain better results compared to the multivariate non-recursive kernel regression estimator under missing data. With an appropriate choice of the bandwidth, we depicted that our proposed estimator is closer to the true regression function than the non-recursive one.
A future research direction would be to extend our findings to the case of functional data like in [32] and [33]. We can also consider the k nearest neighbours smoothing with functional regressor, see [1] in finite dimensional data and [10] in the case of functional data. Another direction is to consider same estimators grounded on bias reduction technique (see [9], [31]), which requires non trivial mathematics and goes therefore beyond the scope of the present paper. Finally, we can also explore the idea developed in the recent work of [4] through considering some semi-parametric Bayesian networks approaches based on the current work.