Statistical inference for multivariate conditional cumulative distribution function estimation by stochastic approximation method

This paper deals with non-parametric estimation of a conditional cumulative distribution function (CCDF). Using a recursive approach, we suggest a multivariate recursive estimator de-(cid:28)ned by stochastic approximation algorithm. Our purpose is to study the statistical inference of our estimator and compare them with those of non-recursive Nadaraya-Watson’s estimator. For this, we (cid:28)rst derive the asymptotic properties of the proposed estimator which strongly depend on the choice of two parameters, the stepsize ( γ n ) as well as the bandwidth ( h n ) . The second generation plug-in method, a method of bandwidth selection minimizing the Mean Weighted Integrated Squared Error ( MWISE ) of the estimator in reference, leads to the optimal choice of the bandwidth and then provides an appropriate choice of the stepsize parameter. We cor-roborate our theoretical results through simulations studies and a real dataset. Therefore, we show that, under some conditions, the Mean Squared Error ( MSE ) of the proposed estimator can be smaller than the one of Nadaraya Watson’s estimator.


Introduction
Assume that we observe independent identically distributed vectors (X 1 , Y 1 ), ..., (X n , Y n ) of a bivariate random variable (X, Y ) with common cumulative distribution function π(x, y) where one is interested in modelling the functional dependence of the observation Y on the covariable X by the conditional cumulative distribution function (CCDF) of Y given X = x, denoted by, for all real y and x, π(y|x) := P [Y y|X = x] .
We will assume that the bivariate random variable (X, Y ) (resp. the random variable X) has a density function f (X,Y ) (resp. f X ) with respect to the Lebesgue measure. Recall that for all real y and x such that f X (x) = 0, the CCDF of Y given X = x is dened by In a variety of non-parametric statistical problems, the estimation of a CCDF is a key aspect of inference. Remember that the CCDF has the advantage to characterize the conditional law of the considered bivariate random variables. Especially, the CCDF is often useful in reliability or survival analysis. More specically the conditional survival function S (y|x) dened by, for all real y and x, S (y|x) := 1 − π(y|x) is of interest, either by itself, or by its independence with the conditional hazard function h (y|x) dened by, for all real y and x, h (y|x) := f (y|x) S (y|x) where f (y|x) denotes the conditional density of Y given X = x. Furthermore, conditional quantiles can also be deduced from the CCDF π by (pseudo)-inversion given x of the function y → π(y|x) and the same procedure may be applied to the estimator of CCDF to nd conditional quantile estimators.
Several non-parametric estimators have been proposed to estimate the CCDF. Many of them are based on rst estimating the R d 1 {u y} f (X,Y )(x,u) du. The conditional cumulative distribution function was rst extensively studied by [Stute, 1986] using a nearest-neighbor-type conditional empirical process. Thereafter, [Hall et al., 1999], motivated by the problem of setting prediction intervals in time series analysis, proposed a new non-parametric method for CCDF estimation based on an adjusted form of the NadarayaWatson estimator. Later, [Ferrigno et al., 2014] established uniform asymptotic certainty bands for the CCDF using the same strategy.
For a general non-parametric regression model, [Kiwitt and Neumeyer, 2012] suggested tow estimators using a kernel approch (see Theorem 1 and 2) where the distribution of the error given the covariate is modelled by a CCDF given by P ( ≤ y|X = x) (see model 3 page 260).
On a given compact set, [Brunel et al., 2010] constructed a minimax estimator of the CCDF. Thereafter, [Veraverbeke et al., 2014] gave a new estimator of CCDF using a method of pre-adjusting the original observations non-parametrically. Recently, [Bouanani et al., 2020] discussed novel method for CCDF problem estimation based on local polynomial technique.
The main purpose of this paper is to provide a nonparametric strategy to estimate the CCDF. We are building a multivariate estimator for the CCDF by a recursive approach using the estimators introduced by [Mokkadem et al., 2009b] and [Slaoui, 2014b] and then we will establish the asymptotic properties of the proposed estimator.
The paper is organized as follows. In section 2 we introduce our method for the estimation procedure of CCDF. The main results of our recursive estimator are given in section 3. Section 4 deals with the asymptotic properties of non-recursive Nadaraya-Watson's estimator. Section 5 displays the performances of our estimator on simulated data as well as a real dataset. The conclusion is given in section 7. Our main results proofs are gathered in section 8. Finally, a second generation plug-in scheme for constructing data-driven bandwidth selection procedures is proposed in appendix A.

Preliminaries
Let (X, Y ) ∈ R d ×R d and (X 1 , Y 1 ), ..., (X n , Y n ) be independent random vectors identically distributed as (X, Y ) with joint density function f (X,Y ) and let f X denote the probability density of X.
In this paper, we will mostly focus on the problem of estimating the CCDF of Y given X = x given by π : Here we introduce our recursive estimator π n dened by with a n (x, y) = Π n n k=1 χ is a multivariate indicator function dened by χ k : (h n ) is the bandwidth: a sequence of positive real numbers that tends to zero.
(γ n ) is the stepsize: a positive sequence of real numbers decreasing towards zero and Π n = n j=1 (1 − γ j ).
Our aim is to study the asymptotic properties of the proposed multivariate estimator of the CCDF and to prove its performances.
We will compare our estimator to the generalized kernel CCDF estimator of Nadaraya-Watson [Nadaraya, 1964] and [Watson, 1964] π n dened by with a n (x, y) = 1 nh d n n k=1 The recursive estimator was constructed by dint of stochastic approximation method. In fact, the use of stochastic approximation algorithms in the context of non-parametric statistics returns to the papers [Robbins and Monro, 1951] and [Kiefer and Wolfowitz, 1952]. Their work has been extended in several directions, we refer the reader to [Blum, 1954], [Nadaraya, 1964], [Ruppert, 1982], and [Duo, 1997].
Later, in the paper [Mokkadem et al., 2009b], the multidimensional case was studied for the estimation of a multivariate probability density using the estimation by condence intervals. Moreover, [Slaoui, 2014b] reused stochastic approximation methods to improve the qualities of the distribution function estimator. Recently, a conditional density estimator was given in [Slaoui and Khardani, 2020].
To build a stochastic algorithm, which approaches the function a : (x, y) −→ R d 1 {u y} f (X,Y ) (x, u)du at a given couple of vectors (x, y), we dene an algorithm of search of the zero function φ : z −→ a(x, y) − z and we set: (i) a 0 (x, y) ∈ R (ii) for all n ≥ 1, a n (x, y) = a n−1 (x, y) + γ n U n (x, y).
Where U n (x, y) is an observation of the function φ at the point a n−1 (x, y).
We point out that to dene U n (x, y), we follow the approach of [Révész, 1977] and [Tsybakov, 1990].
By taking U n (x, y) = χ n (y) h −d n K x − X n h n − a n−1 (x, y), the stochastic approximation algorithm that we present to estimate recursively the function a at a couple of vectors (x, y) can be written as follows : a n (x, y) = (1 − γ n )a n−1 (x, y) + γ n χ n (y) h −d n K x − X n h n . (3) Through this paper, we consider that a 0 (x, y) = 0, then by a recurrence, we get a n (x, y) = Π n n k=1 In this setting, we use the recursive multivariate probability density estimator of the density function f X noted f n and dened in [Mokkadem et al., 2009b] which was constructed with the same tools of stochastic approximation algorithm and under the condition that f 0 (x) = 0, we have: All over this paper, we use the following notations: ij (., y) := In the sequel, let us set the following denition of class of regularly varying sequences introduced by [Galambos and Seneta, 1973].
Denition 1. vet (v n ) e nonrndom positive sequene nd γ ∈ RF e sy tht In order to introduce our theoretical main results, we need the following technical assumptions. Assumptions: X ij is bounded.
(iii) f (X,Y ) is bounded and twice continuously dierentiable with respect to x.

Discussion of the assumptions:
Obviously, all these assumptions are very standard and are usually assumed in the context of non-parametric estimation. Classical assumption (A 1 ) provides regularity conditions on the kernel density estimator introduced by [Rosenblatt, 1956] and [Parzen, 1962] which is widely used in the non-parametric framework for the functional estimation. Assumption (A 2 ) on the stepsize and the bandwidth was used in the recursive framework for the estimation of the density function in [Mokkadem et al., 2009b, Slaoui, 2014a and for the distribution function estimation in [Slaoui, 2014b]. Furthermore, assumption (A 2 )(iii) on the limit of (nγ n ) as n goes to innity is usual in the framework of stochastic approximation algorithms. It implies in particular that the limit of (nγ n ) −1 is nite. While assumptions in (A3) are technical conditions imposed in order to attain brevity of proofs. Those conditions on the density of the couple (X, Y ) were used in the non-recursive framework for the estimation of the regression function [Nadaraya, 1964, Watson, 1964 and in the recursive framework [Mokkadem et al., 2009a, Slaoui, 2015.
In this section, we have to recall the following proposition which introduces the bias and the variance of f n . The proof of this result was given in [Mokkadem et al., 2009b].
Bias and variance of f n : 3 Main results In order to study the asymptotic properties of our estimator π n , we have to rst introduce the following proposition which gives the bias and the variance of a n .

3.1
Bias and variance of a n : sf a ∈ α d+4 , 1 D then In the following theorem, we introduce our main result which gives the bias and the variance of our CCDF multivariate estimator π n .

3.2
Bias and variance of π n : sf a ∈ α d+4 , 1 D then We call attention to the fact that the bias and the variance of the estimator π n dened by the stochastic approximation algorithm (1) heavily depend on the choice of the stepsize (γ n ).
In the sequel, let us state the following theorem which gives the asymptotic normality of our recursive estimator π n . Throughout this paper, we will denote convergence in distribution and the Gaussian distribution by D −→ n→+∞ and N , respectively.

3.3
Asymptotic normality of π n Remark 1. xote tht for the prtiulr se where c = 0D the distriution onvergene rte expresE sion is giving s followed In order to measure the asymptotic quality of the CCDF recursive estimator π n , we set up the Mean Weighted Integrated Squared Error (M W ISE).

3.4
Asymptotic expressions of M W ISE[π n ] Proposition 3. he M W ISE of the estimtor π n is given s followedF The following corollary ensures that the bandwidth which minimizes the M W ISE of π n depends on the choice of the stepsize (γ n ) and then the corresponding M W ISE depends also on (γ n ).

Corollary 1. vet ssumptions
henD the orresponding M W ISE is given y The following corollary is given in the special case where (γ n ) is chosen as (γ n ) = (γ 0 n −1 ) in order to minimize the M W ISE[π n ].
In order to nd the optimal choice of (γ n ), we can deduce that the minimum of M W ISE[π n ] is reached at γ 0 = 1. We introduce the following corollary.
gonsequentlyD the orresponding M W ISE is given y The statistical inference of the CCDF multivariate non-recursive estimator π n is established by our next section.
The following results can be handled in much the same way as for π n , the only dierence is that it's for a non-recursive case. (See [Hall et al., 1999] for more details of the univariate case.) 4 Asymptotic properties of π n In order to study the asymptotic properties of our estimator π n , we have to rst introduce the following proposition which gives the bias and the variance of π n .

4.1
Bias and variance of π n Proposition 4. vet ssumptions (A 1 ) nd (A 3 ) holdF hen the is nd vrine of xdryE tson9s estimtor re given s followedF The following proposition gives the distribution convergence rate of the non-recursive estimator.

4.2
Asymptotic normality of π n Theorem 3. vet ssumptions (A 1 ) − (A 3 ) hold nd suppose tht nh d+4 In the next subsection, we state the expression of the Proposition 5. vet ssumptions (A 1 ) nd (A 3 ) holdF o minimize the M W ISE of π n D the ndE width (h n ) must e equl to henD the orresponding M W ISE is given y .

Numerical applications
The aim of our numerical studies is to compare the performance of our recursive estimator (1) with that of the non-recursive Nadaraya-Watson (2).

Bandwidth selection
Although theoretical asymptotic study yields to the optimal bandwidth, unknowing the density function make it hard to interpret it in practice. Hence, kernel smoothing in non-parametric statistics requires the choice of a bandwidth parameter. This choice is crucial to obtain a good rate of consistency of the kernel estimators. It has a big inuence on the size of the bias. One has to nd an appropriate bandwidth that produces an estimator which has a good balance between the bias and the variance of the estimator of the function a(·, ·) as well as f (·). It is worth noticing that the bandwidth selection methods studied in the literature can be divided into three broad classes: the cross-validation techniques, the plug-in ideas and the bootstrap procedure. In this work, we are interested in the plug-in method. [Altman and Léger, 1995] proposed an ecient method of bandwidth selection, which minimizes an estimate of the mean weighted integrated squared error, using the density function as a weight function. For this, we followed the work of [Slaoui, 2014a] see the appendix A for more details.
Let's start our numerical application with some simulations in the univariate case.

Simulation studies
With the purpose of comparing the proposed recursive estimator and the Nadaraya-Watson nonrecursive estimator, we consider three sample sizes: n=100, 200 and 500, a xed number of simulations: N=500 and three distribution models: • Model 1: Y = 2 sin(πX) + , where X follows the binomial distribution B (2, 1/3) and follows the normal distribution N (0, 1) .
We denote by π * i the reference CCDF and by π i the test CCDF, then we calculate the following measures: · Mean squared error: · The linear correlation: .
In the following, we present the dierent steps of the simulation algorithm.

Simulation Algorithm
Algorithm 1 K is the Gaussian kernel, n the simple size, N p the number of observations and N the number of iterations.
Input: K, n and N p and N .

6:
We x x and consider an arbitrary sampling vector T of y of length N p . 7: for the univariate recursive CCDF estimator.
(resp. π l (y|x) = n k=1 x − X k h k for the univariate non-recursive CCDF estimator). Π (l) = π l (T |x). (resp. Π (l) = π l (T |x).) 8: end for output: The vectors π and π. 9.999981e-01 Table 1: Quantitative comparison between Nadaraya-Watson estimator and the proposed estimator with stepsizes (γ n ) = (n −1 ) through a plug-in method in the unidimensional case and for Model 1.  Table 2: Quantitative comparison between the recursive estimator and the non-recursive one with stepsizes (γ n ) = (n −1 ) through a plug-in method in the unidimensional case and for Model 2.  Table 3: Quantitative comparison between Nthe recursive estimator and the non-recursive one with stepsizes (γ n ) = (n −1 ) through a plug-in method in the unidimensional case and for Model 3.
1 -The MSE of the proposed recursive estimator with stepsize (γ n ) = (n −1 ) through a plug-in method is smaller than that of the Nadaraya-Watson's non-recursive estimator.
2 -The M SE decreases as the simple size increases and the Cor increases as the sample size increases.                 Table 4: Quantitative comparison between Nadaraya-Watson estimator and the proposed estimator with stepsizes (γ n ) = (n −1 ) through a plug-in method for the Insurance Company Benchmark (COIL 2000) dataset case.

Conclusion
In this paper, we proposed a multivariate recursive CCDF estimator. We rst studied the asymptotic properties of the proposed estimator, we gave the bias as well as the variance in order to show that our estimator asymptotically follows a normal distribution. Therein, we compared our recursive estimator with non-recursive multivariate Nadaraya-Watson's estimator using the plug-in bandwidth selection approach. Following our numerical application, the proposed recursive CCDF estimator with stepsize (γ n ) = n −1 gave smaller M SE compared to the non-recursive Nadaraya Watson's CCDF estimator. 7 Proofs: This section is devoted to the proofs of our main results. For this, we introduce a lemma that will be widely used for the study of our estimator π n .
woreoverD for ny positive sequene (α n ) suh tht lim We point out that, the proof of this lemma was given in [Mokkadem et al., 2009b].
According to the assumption (A 1 ), we have Moreover, Taylor's expansion with integral remainder ensures that ij (x − tzh k , y)z i z j K(z)dtdz.
On account of the asymption (A 1 ), ij (x, y) z i z j K(z)dtdz.
Since we have a ij is bounded and continuous at x for all i, j ∈ {1, . . . , d}, we obtain lim k→+∞ η k (x) = 0, which assures that, η k (x) = o (1) . Thus, • For the case a α/(d + 4), we have lim n→+∞ nγ n > 2a and then 1 − 2aξ > 0. The application of lemma 1 enables us to write We thus obtain the desired result E[a n (x, y)] − a(x, y) = • For the case a > α/(d + 4), we have lim n→+∞ nγ n > α−a 2 which gives that h 2 n = o γ n h −d n . Then the use of lemma 1 leads to E[a n (x, y)] − a(x, y) = Hence, the claimed result (11) is established.
For the variance, and due to the independence of X i , for i = 1, . . . , n, it's obvious that V ar[a n (x, Therefore, the Taylor's expansions ensures that, V ar[a n (x, y)] = Π 2 n n k=1 where, • For the case a < α/(d+4), we have lim n→+∞ nγ n > 2a which gives that γ n h −d n = o h 4 n . By applying lemma 1 we infer that V ar[a n (x, y)] = Π 2 n n k=1 Hence we obtain the result V ar[a n (x, y)] = o h 4 n .
• For the case a α/(d + 4), we have lim Thus, this leads to the result given in (13).

7.2
Proof of Theorem 1: Our proof consists in the following decomposition, for f n = 0, n ≥ 0 with It follows from (21) that the asymptotic behaviour of π n (y|x) − π(y|x) can be deduced from the one of A n (x, y).
1. Bias of π n : Here, we can write Now, using the rst bias part of proposition 1 and proposition 2 and considering the fact that a(x, y) = π(y|x)f X (x), then by combining the assertions (6), (7), (10) and (11) we obtain the relations (14) and (15).
For the case a ≥ α/(d + 4), we can deduce with (22) that With The same reasoning applied for the case a < α/(d + 4), we get the desired result (16). Now, let's remind the precise statement of Lyapunov's theorem, which will be used in the next proof.
Theorem 4. vet (X n ) e sequene of independent rndom vrilesD entered ll hving (nite moment of order 2 + δF e noteD u 2 n = V ar nder the following vypunov onditionD for p > 0D

Proof of Theorem 2:
We can write, for all n ≥ 0, x, y ∈ R d , This proof falls naturally into two steps. Here and subsequently, we notice First step: It's obvious that Second step: We are trying to apply Lyapunov's theorem 4 for S k (x, y).
On the one hand, we can write

Moreover, we have
Thereafter, by applying lemma 1, it can be deduced that On the other hand, we have Then we can write Therefore, Hence, In the following and for the application of lemma 1, let us assume that there exists a positive real p such that lim n→+∞ nγ n > 1 + p 2 + p (α − ad).
Then we obtain Given that and by replacing u n with its value in relation (26), we deduce that We rst introduce the M W ISE expression: Based on the relation (28) and by distinguishing the dierent possible cases according to the expressions of the Bias ( (14) and (15)) and the Variance ( (16) and (17)), one can prove this proposition and nds the required result.
A Plug-in: As a result of the plug-in procedure, based on the expression of the M W ISE, we estimate the unknown quantities I 1 and I 2 by proposing an asymptotic unbiased estimators. This process is known as a plug-in estimate. For this, we introduce (b n ) ∈ GS(−δ), δ ∈ (0, 1) . In practice, [Altman and Léger, 1995] take b n = n −δ min s, Q 3 − Q 1 1.349 , with, s is the sample standard deviation and Q 1 , Q 3 are the rst and third quartiles.
In the following and for the sake of simplicity, the kernel K we shall use is taken as a product of univariate kernels K satisfying R K(x)dx = 1.
In addition, we note: To estimate the optimal bandwidth (20), we need to estimate I 1 and I 2 . Here we can write a n (x, y) = 1 nh d n n k=1 χ k (y) K x − X k h n = 1 nh d n n k=1 d