Approximations of the solutions of a stochastic differential equation using Dirichlet process mixtures and Gaussian mixtures

Stochastic differential equations arise in a variety of contexts. There are many techniques for approximation of the solutions of these equations that include statistical methods, numerical methods, and other approximations. This article implements a parametric and a nonparametric method to approximate the probability density of the solutions of stochastic differential equation from observations of a discrete dynamic system. To achieve these objectives, mixtures of Dirichlet process and Gaussian mixtures are considered. The methodology uses computational techniques based on Gaussian mixtures filters, nonparametric particle filters and Gaussian particle filters to establish the relationship between the theoretical processes of unobserved and observed states. The approximations obtained by this proposal are attractive because the hierarchical structures used for modeling are flexible, easy to interpret and computationally feasible. The methodology is illustrated by means of two problems: the synthetic Lorenz model and a real model of rainfall. Experiments show that the proposed filters provide satisfactory results, demonstrating that the algorithms work well in the context of processes with nonlinear dynamics which require the joint estimation of states and parameters. The estimated measure of goodness of fit validates the estimation quality of the algorithms.


Introduction
Stochastic differential equations are used in many applications related to basic science such as modeling of biological, chemical, physical, environmental, engineering, economics and finance processes among others ( [36]).A stochastic differential equation describes the time evolution of the dynamics of a state vector, based on physical observations from a real system obtained with errors.In practice, the interest consists in estimating the states or parameters of the dynamic system.One of the difficulties commonly encountered in the estimation of parameters in a stochastic differential equation is that the transition density of the equation can not be evaluated in closed form.There are many estimation methods that are based on the maximum simulated Likelihood methods ( [64]; [65]; [53]; [22]; [1]; [49] and [21]).Another standard method is to approximate the solutions using the Markov Chain Monte Carlo (MCMC) algorithm ( [44]; [27]; [13]; [66] and [67]).It is also possible to approximate solutions by using numerical methods ( [23] and [26]).There are others methods based on Hermite expansions ( [1]); Taylor approximations ( [25]); filtering Theory ( [25]; [43] and [24]); approaches that use algorithms based on the integration of Gaussian quadratures and sigma point methods ( [28]; [29]; [60] and [23]); adaptive MCMC algorithms based on numerical integration methods recently have been used to approximate in the context of where x k ∈ R n is a vector of stochastic states in a continuous-time, x k0 is a stochastic initial condition satisfying the following condition E(|x 2 k0 |) < ∞; {B k : k ≥ 0} is a m− dimensional vector of a standard Brownian motion; F : [k 0 , T ] × R n × R d → R n is a known nonlinear drift function, and L : [k 0 , T ] × R n × R d → R n × R m is a diffusion matrix that satisfies sufficient regularity conditions to ensure the existence of strong solutions ( [52]).Theoretically the system is imperfectly observed through a process of continuous observation y k ∈ R m which is related to the process x k by the equation: where h : R n × R → R is a known function, {W k : k ≥ 0} is a vector of a standard Brownian motion, which is independent of B k , D k is a matrix of diffusion, and the initial state x 0 .When the dynamic system in continuous time is approximated by a discrete system, the derivatives in continuous time are approximated by difference equations in discrete time, which can be expressed in the form of a space state model, as follows: The equation given in (3) represents a dynamic system, where x k denotes the vector of unknown states in a time t = k, u k is a random error state estimation, M k is an operator that maps the space transition state within the state space.The equation given in (4) represents the observed system, where H k is an operator that maps the states within the space of observations in time t = k, y k is the vector of observations, and v k is a vector of random errors of observation.In the given equation ( 5), F (., .) is a function of unknown probability distribution.The unobserved solutions x k are assumed to be modeled by a Markov process of first order, with an initial distribution function p(x 0 ), and transition equation p(x k |x k−1 ) given by the model defined in equation (3).Each observation y k is conditionally assumed to be independent given the state x k , where p(y k |x k ) is obtained from equation ( 4).The model is summarized in three steps: first an initial seed x 0 ∼ p(x 0 ) is generated; then is predicted; and finally y k ∼ p(y k |x k ) is updated.The general objective is to estimate the unknown states x k = x 0:k = (x 0 , . . . ,x k ), based on measurements obtained from the observation process y k = y 1:k = (y 1 , . . ., y k ); the main interest is in estimating the filtering distribution p(x k |y 1:k ), the predictive distribution p(x k+1 |y 1:k ), and posterior mean and covariance.The recursive estimation in time k, is performed by the following procedure: • Step 1: The filtering distribution: • Step 2: The predictive distribution: • Step 3: The posterior predictive expectation: • Step 4: The posterior predictive covariance: Assuming that v k is distributed according to a known probability density function v k ∼ N (0, R k ) with fixed but unknown parameters, R k is to be estimated.If, in addition, the errors u k do not have a known distribution, that is u k ∼ F (., .),where F (., .) is an unknown function, then it is necessary to estimate the probability distribution function F (., .).At some time t = k, we can estimate the posterior distribution joint p(x 0:k |y 1:k ) using the Bayes Theorem, as follows: It is possible to obtain a recursive formula to estimate the joint density p(x 0:k |y 1:k ): where the filtering distribution p(x k |y 1:k ) is estimated by the following recursive Bayesian filter: 1.It is initialized with a prior distribution p(x 0 ). 2. For k = 1, . . ., N run the following: • Prediction step: the state x k is predicted, which can be calculated: -In the discrete case: The Chapman-Kolmogorov equation: -In the continuous case: The Fokker-Planck-Kolmogorov equation is integrated: where: • Update step: given the observation y k , the predictive distribution is updated by: Given the difficulty of calculating the integrals involved in ( 12) and ( 14); a way to solve the situation is to use computational techniques to approximate the posterior distribution.This article propose three recursive algorithms to achieve this goal.
A particular problem to be solved in this work is to estimate a probability density using a nonparametric Bayesian method involving an endless mix model as considered in [42] and [56]; but in the context of stochastic differential equations.We propose to use the following hierarchical structure to flexibly approximate the probability density of the solutions of the equation given in (1): where DP is a Dirichlet processes and:

Filtering algorithms to approximate a density
Here, three filtering algorithms are proposed to approximate the solutions of stochastic differential equation: Gaussian Mixtures Filter (GMF), Nonparametric Particle Filtering (NPF), and Gaussian Particle Filter (GPF).

Nonparametric density estimation
The problem of estimating a density using a nonparametric method consists of the following, data x 1 , . . ., x n are assumed to be an interchangeable sequence of independent observations taken according to a unknown probability density function F , that is: The x i can be scalars or vectors.Traditional methods of parametric statistics, consider probability models that are indexed by unknown finite dimensional parameters, such as the mean and variance, which must be estimated.In contrast, Bayesian nonparametric methods consider a prior probability model p(F ) for the unknown distribution function F , where F is a function in a space of infinite dimension.This requires the definition of a measure of random probability of a collection of distributions functions which should have certain properties such as having a large support and the condition that the resulting posterior distribution must be analytically tractable ( [16]).
The DP ([16] and [62]), are an important Bayesian nonparametric modeling tool.A random probability distribution F is generated by a DP if for any partition A 1 , . . ., A k of sample space Ω, the vector: where Dir(α 1 , . . ., α k ) denotes the Dirichlet distributions; F 0 (.) is a base measure of random probability defining the expected value, E(F ) = F 0 or equivalently E(F (A)) = F 0 (A) for any A ⊂ Ω, and α is a precision parameter that defines the variance.The motivation to use the DP is that, in this approach, updating the posterior distribution is a simple process.That is: where δ xi denotes the Dirac delta.
In a Dirichlet mixture process model (DPM), samples x i , for i = 1, . . ., n, are taken from a component of the mixture parametrized by θ i ∈ Θ, where θ i is a latent variable.This auxiliary variable is generated by an unknown distribution G, which represents a measure of random probability distributed according to a prior distribution (a distribution on the set of probability distributions).That is, the DPM is hierarchically defined as follows: where the mixing distribution G is a DP with concentration parameter α, and distribution base G 0 .If integrated on random measure G, we have a non-parametric model to estimate F , as follows: There are two basic representations of DP , known as Polya urn scheme ( [7]), and a prior Stick-breaking scheme ( [62] and [54]).The scheme of Polya urn, assumes that the joint distribution of θ 1 , . . ., θ n , can be expressed as follows: where δ θi|θ l is a point mass θ l .When α → 0, all the θ i = θ 1 ∼ G 0 (θ 1 ), and when α → ∞, θ i ∼ G 0 (θ 1 ).Moreover, as the θ i are exchangeable the Polya urn scheme can be written as: where θ −i represents the {θ l : i ̸ = l}.
A strategy for representation the DP , using the scheme called Stick-breaking, is to write the measure G explicitly as an infinite sum of sums of atomic measures [62].Embodiments of a DP are expressed as follows: where, δ θ k denotes the measure of the Dirac delta located at θ k and The underlying random measure G is discrete with probability one.Using equation ( 19), a flexible a priori model used to estimate the unknown distribution F is obtained: A recursive form to estimate the unknown probability density in nonparametric form is given as follow: Algorithm 1: Nonparametric algorithm to estimate a density • Step 1: Simulate: • Step 2: Simulate: where: -The conditional mean is obtained by: -The conditional variance is obtained by: • Step 3: Calculate: • Step 4: Conditioned states are generated x k+1 |x k , as follows: • Step 5: The observations of the states are generated y k |x k+1 : where IW denotes Inverse Wishart distribution, and N IW (., ., ., .)denotes a Normal Inverse Wishart.

Gaussian mixtures Filter (GMF)
If we assume that the observation model and also assume that the a priori model of a state x is approximated by p(x), and the likelihood is approximated by p(y|x), where: and Then using equations ( 32) and ( 33), the posterior distribution is a Gaussian mixture given by: Note that: where: so that: For a recursive algorithm, it is assumed that the probability density function of the initial condition p(x 0 ) is approximated by: Without loss of generality, it is assumed that in an instant k − 1 the system states are known p(x k−1 |y 1:k−1 ), that is: If , then it is assumed that x k+1 |x k is given by: where: Then the conditional state x k , given the observed data y 1:k−1 , is approximated by: where: The assessment of I i,j (x k ) can be treated as an estimation problem of a nonlinear process with non-Gaussian distribution.In the literature, they have used techniques such as Kalman filter assemblies, or Kalman filter sigma points, to approximate I i,j (x k ) (see [41]).Furthermore, if it is assumed that the likelihood of the data is distributed according a Gaussian: where: ) −1 and A summary of the algorithm used to estimate the density function by GMF is shown in the following procedure: Algorithm 2: GMF • Step 1: Simulate the initial states p(x 0 ) as shown in equation (39).
• Step 2: Use the Algorithm 1 to estimate the density of solutions.

Nonparametric filter particles (NPF)
Suppose we want to approximate the filtered density p(x k |y 1:k ) using an importance function in a marginal state space x k , that is: A strategy for estimating the filtered distribution is to use a Sequential Monte Carlo method known as Particle filters.The particle filters consider a random sample which characterize the posterior probability density function p(x k |y 1:k ).The weights are chosen such that Then the posterior distribution, also called the filtered distribution in time k, can be approximated by an empirical distribution formed by the mass points or particles: Stat., Optim.Inf.Comput.Vol. 4, December 2016 where δ(.) is the Dirac delta function.Given the posterior distribution, some quantities of interest can be estimated, such as the expected values of a function g(x k ) associated with the filtered distribution p(x k |y 1:k ), that is: Some of the researchers who worked on the subject are: [71]; [20]; [32]; [37]; [11]; [38]; [55]; [10]; [17]; [19]; [39] among others.The weights w (i) k are chosen using the principle of importance sampling.The principle of importance sampling is based on the following argument: Suppose that p(x) ∝ γ(x), is a probability density which is difficult to sample, but γ(x) can be evaluated and consequently p(x) also be evaluated to a proportionality constant.Then proceed as follows: Let x (i) ∼ q(x), i = 1, ..., n be a generated sample of a proposed distribution q(.), called density of importance.Then a weighted density approximation of p(.) is given by: where: The w (i) represents a standardized weight of the i-th particle.If samples {x (i) 0:k } are taken using a density of importance q(x k |y 1:k ) then the weights used to approximate the equation ( 46) are obtained by: If the density of importance can be factored so that: then we can get the samples x (i) 0:k from q(x k |y 1:k ) by increasing each of the samples x (i) 0:k−1 that already exist and are obtained from q(x For the updated weights, the filtered distribution p(x k |y 1:k ) is expressed in terms of the equation given in (45).Updated weights are given by: In particular, if q(x k |x k−1 , y 1:k ) = q(x k |x k−1 , y k ), then the density of importance depends only on x k−1 , and y k .This situation is appropriate when required to obtain the filtered estimator p(x k |y 1:k ) in real time k.Then the weights are modified as follows: The filtered density p n (x k |y 1:k ) can be approximated by: It was proved in [12] that when n → ∞ the equation given in (52) approaches the true posterior distribution p(x k |y 1:k ).This paper seeks to approximate the filtered density function by using nonparametric statistical techniques.A Dirichlet mixing process is used to estimate the unknown density of states.The algorithm predicts the equation given in (12) and updates the equation given in (14) based on simulated samples.In order to implement the algorithm, it is supposed to start with a set of random samples {x . The procedure is summarized as follows: Initialize with a nonparametric prior distribution of an initial state: (for k = 0, and i = 1, . . ., n): where the samples {x -For i = 1, . . ., n: -Step 3: Obtain an estimated prediction density: . For example, the predictive mean and covariance can be estimated as follows: , the importance density is sampled: and the importance weights are calculated: -Step 6: For i = 1, . . ., n, sample: x(i) k ∼ p n (x k |y 1:k ) and calculate the importance weights: , which approximate the filtered density function: The updated filtered distribution can be characterized by the mean and covariance:

Gaussian particle filter (GPF)
The interest in the following line is to present a method for approximating the solution using a Gaussian Particle Filter (GPF) as proposed in [34].This algorithm allows the generation of samples that approximate the predictive distribution p(x k |y 1:k−1 ), and the filtered distribution p(x k |y 1:k ) using Gaussian densities, where the vector of means and the variance-covariance matrix are estimates of the particles generated.If at time k − 1, the solution is approximated by N (µ k−1 , Σ k−1 ), and at time k, it is approximated by N (μ k , Σk ), then the steps to implement the GP F consist of the following: ν), and do: ) .

Results
The first instance consists of a linear case, while the second example consists of a case with a nonlinear structure.
To measure the estimation quality of the proposed algorithms we use the square root of the mean square error (RMSE) defined as: where
In Table I, the results of the goodness of fit measure, used to gauge the estimation quality of the algorithms, are shown.In this case, the RMSE calculations in the table shows no significant differences in the estimated errors and the computing time.Figure 1, shows three graphs depicting the reconstruction of the true Lorenz attractor and the posterior means of the estimated solution by the three algorithms proposed, where it can be seen that all filters fit almost perfectly to the original map of Lorenz.

Example 2
The second example is a temporary space model based on a stochastic differential equation for predicting rainfall described in [4], where x t denotes the total rainfall at time t in a locality of s = (x, y) ∈ R 2 ; and x 0 is an initial state.Suppose that the change in total rainfall during a very short time interval ∆t and (∆x) 1 = γ, and (∆x) 2 = 0 have probabilities p 1 = λ∆t, and p 2 = 1 − λ∆t, respectively.The expected value and the mean squared change in the total precipitation is given by: Letting µ = γλ∆t, and σ = √ γ 2 λ∆t, a stochastic differential equation to model the total precipitation x t is obtained: where dW t is a vector of a Brownian process with mean 0 and covariance matrix Q t .The evolution in time of the marginal density of the state is given by the posterior marginal distribution p(x k |y 1:k−1 ), which satisfies the Fokker-Planck equation: where L is a diffusion operator defined by: with p = p(x k |y 1:k−1 ).The initial condition is given by p(x k−1 |y 1:k−1 ).Applying the Fokker-Planck equation to the equation ( 58), we obtain: The next step is to find a solution to the equation given in (61).The method of discretization of Crank and Nicolson was used to do this.See [59] for detailed.To illustrate the methodology, a data series of daily precipitation was considered.The data included information from January 2011 until May 2012 for three (3) meteorological stations located in Arágua (Ceniap, Tamarindo and Tucutunemo), Venezuela.The data are available in http : //agrometeorologia.inia.gob.ve/.
In table II, the RMSE estimated by the proposed algorithms are shown, where we can observe a low variability in the RMSE for the three filters in the Ceniap station.In the Tamarindo station, the lowest RMSE was obtained by the NPF, and the best estimated values in the Tucutunemo station were calculated with the GPF.The computing time for the three filters in all meteorological stations are similar, although the computing time of the NPF is slightly lower than the other two filters.In Figures 2, 3 and 4, the three graphs represent the real states in black, the simulated states using the GMF in red, the NPF in blue and the GPF in green for the three stations.We can see a very similar behavior between the actual values, and simulated values of the states by the three proposed algorithms.

Conclusions and discussion
This article has introduced a Bayesian nonparametric estimation method to approximate the solutions of a general stochastic differential equation.The main contribution of the work is presenting an approach to combine mixtures of Dirichlet processes and mixtures of Gaussian processes in terms of state space models.The proposed hierarchical structures are extremely flexible with regard to the specifications of the prior assumptions of parameters, are easy to interpret and allow to model a variety of physical phenomena under uncertainty.In addition, we propose an efficient way to predict and update the states of the filtered distribution of the dynamic system through the GMF, NPF and GPF algorithms.The methodology was illustrated by means of two stochastic differential equations, one of them synthetic, and the other obtained from a real process.Since the approximate solutions were obtained from complex models, it is shown that the proposed filters have good performance in the reconstruction of the Lorenz attractor and the states of rainfall model.The computational implementation is also of low cost.It is also shown that the algorithms work well in the context of nonlinear dynamic processes, which require joint estimation of states and distributions of the noise of the equation of state.To measure the quality estimation of the algorithms the square root mean square error was used as a measure of goodness of fit.This measures produced insignificant errors of the estimation.

Figure 2 .
Figure 2. Real and estimated state by the GMF, NPF and GPF for Tamarindo station.

Figure 3 .
Figure 3. Real and estimated state by the GMF, NPF and GPF for Tucutunemo station.

Figure 4 .
Figure 4. Real and estimated state by the GMF, NPF and GPF for Ceniap station.

Table I .
RMSE and computing time of the algorithms GMF, NPF, and GPF.

Table II .
RMSE and computing time of the algorithms GMF, NPF, and GPF.