Stochastic Models to Estimate Population Dynamics

The growth dynamics that a population follows is mainly due to births, deaths or migrations, each of these phenomena is affected by other factors such as public health, birth control, work sources, economy, safety and conditions of quality of life in neighboring countries, among many others. In this paper is proposed two statistical models based on a system of stochastic differential equations (SDE) that model the dynamics of population growth, and three computational algorithms that allow the generation of probability distribution samples in high dimensions, in models that have non-linear structures and that are useful for making inferences. The algorithms allow to estimate simultaneously states solutions and parameters in SDE models. The interpretation of the parameters is important because they are related to the variables of growth, mortality, migration, physical-chemical conditions of the environment, among other factors. The algorithms are illustrated using real data from a sector of the population of the Republic of Ecuador, and are compared with the results obtained with the models used by the World Bank for the same data, which shows that stochastic models Proposals based on an SDE more adequately and reliably adjust the dynamics of demographic randomness, sampling errors and environmental randomness in comparison with the deterministic models used by the World Bank. It is observed that the population grows year by year and seems to have a definite tendency; that is, a clearly growing behavior is seen. To measure the relative success of the algorithms, the relative error was estimated, obtaining small percentage errors.


Introduction
Biological models are a multidisciplinary research field that studies the interactions between the internal and external elements of a real phenomenon.The process can be represented with a dynamic mathematical system; that is, by means of systems of ordinary differential equations, where it is assumed that the observed dynamics are exclusively determined by internal deterministic mechanisms.However, real biological systems are always exposed to external factors that are not completely controlled or not possible to model explicitly.Any biological process can be studied from the perspective of systemic biology, such as the growth of a cell, the interaction between two bacteria or the blood circulation in an organism.Therefore, the study of population dynamics as a system of interactions is part of the field of systemic biology.All the ecological theories regarding populations and communities have to do, in one way or another, with various aspects such as their sizes, how they grow or decrease, whether they show stable growth or fluctuate, how they respond to environmental changes and how they influence the environment of other populations.This suggests that an approximate way to include environmental contrasted with the estimates obtained by the World Bank models, unlike of the models, these estimates are based on censuses that are interpolations or extrapolations based on demographic models.
The two stochastic models proposed in this work are: stochastic growth suggested by [32], and second stochastic model that includes environmental variability according to [6].The structures of these models are widely used in many applications, such as: economics and finance ( [9], [11], [14], [15], [17], [18] among others), in robotics, [24], biological systems, [14], epidemiology, [20], [21] and [26], and for precipitation models, [37], [18], [19], [35] and [36], among others.The implementation of these models is done following a Bayesian methodology under which three algorithms are proposed: Metropolis-Hasting (MH), Monte Carlo Sequential (SMC) and Particle Markov Chain Monte Carlo (PMCMC).The first two algorithms are widely known in the literature, however the method based on the PMCMC algorithm is more recent and was proposed by [7], and is used to estimate the solutions and parameters of a stochastic differential equation of Itô, this algorithm is based on a combination of the Markov chains Monte Carlo methods (MCMC) and SMC methods.The idea is to divide the original sampling problem into smaller and simpler sampling problems, by focusing on some of the subcomponents of the target distribution q(.).
The rest of the article is organized as follows.In Section 2, the problem is introduced and the two models to be studied are proposed, in Section 3 the three computational algorithms used to estimate the growth projection of the population are developed, in Section 4 the results of illustrate the proposed methodology with real data and establish comparisons with standard methods, in Section 5 conclusions and discussions are established.

The problem
Natural phenomena are modeled generally in a suitable way by some Markov process which in turn can be written in the form of space state model.These models are constructed according to physical, chemical, environmental, or economic criteria, and on the bas is of incomplete observations or with noisy measurements.Markov processes are continuous-time processes with continuous trajectories but with measurements taken at a discrete time, and such events define the so-called partially observed diffusion processes ( [20]).To define the general problem, the notation given in [32], and suppose that x t ∈ R m is a continuous time signal denoting the solution of the stochastic differential equation of Itô, given by: Where • A : R k → R l is a set of (possibly nonlinear) functions of Lipschitz globally known forces.
• The parameters that are estimated, the last m rows of the drift matrix (the first k − m rows are known) The diffusion matrix C ∈ R k×m is of the form: Where Γ ∈ R m×m is a non-singular constant array.The estimated parameters in the drift and diffusion correspond to the coordinates, that are driven directly by the white noise.Assuming, the standard regularity conditions of existence and uniqueness of the solution x t of the stochastic differential equation given in (2) ( [31]); it is also assumed that the process x t is hypo-elliptical, this means that the noise propagates through the drift in all the components of the system given in (2).The structure of the C matrix restricts noise in such a way, that it only acts on a subset of the variables that can be called rough variables; these variables can evolve through the coupling in the drift.The remaining variables of the system are called smooth variables.To distinguish between rough variables and smooth variables, is introduced the notation x(t) = (u(t), v(t)) T , where u(t) ∈ R k−m is the smooth variable, and v(t) ∈ R m is the rough variable.
The problem of interest is to consider u as an observed variable, and v as an unobserved variable, and estimate the parameters Γ and the entries of the Θ rows that correspond to the trajectories of the non-observed variables v, considering that {u n } N n=0 is a known solution of the stochastic differential equation given in (2).Under the Bayesian paradigm, a priori distribution is needed for the parameters Θ and ΓΓ T , denoted as p(Θ, ΓΓ T ), and a likelihood for the data, denoted as L(u, v|Θ, ΓΓ T ), then the posterior distribution for the parameters can be estimated as: In this type of problem, usually the exact likelihood of the transition density is unknown, so it is natural to use approximations to be able to make inference.Suppose that for the Markov sequence {x t } T t=0 , the transition map of t → t + 1 is determined by: Then, the Euler-Maruyama approximation of this map is given by: Where is the identity matrix, and: It is a non-invertible matrix.To prevent R(0, Θ) from being invertible, the following approximation is used: Where R(∆t, Θ) is invertible, and is given by: R(∆t, Θ) = In general, the solution x t is observed with errors through a continuous observation process y t , that is related to the process x t through the following expression: Where g(.) is a vector function of known real value.Then, the model under study can be represented by: When the dynamic system of continuous time is approximated by a discrete time system, the derivatives in the continuous time domain are approximated by an equation in difference in discrete time, that is: The models given by ( 12) and ( 13) are known as the equation of state and the equation of observation, respectively.Using Bayesian algorithms is estimated the a posteriori marginal distribution p(x t |y 1:t ) in time t in two steps, prediction and update, where x 0:t = (x 0 , . . ., x t ) are the unknown solutions states and y 1:t = (y 1 , . . ., y k ) are the observed solutions states.
To carry out the recursive procedure, it is assumed that the a posteriori marginal density p(x t−1 |y 1:t−1 ) is known at a time t.Then, using the equation ( 12) is obtained the updated density p(x t |y 1:t−1 ) of the state in time t, and using the equation of Chapman-Kolmogorov transition: When a new observation y t is recorded, the posterior density is updated using the equation (13)  In the [32] is considered the solution x(t) = (u(t), v(t)) T divided into two parts, a known solution u t , and a solution v t unknown, so that this stochastic process can be seen as a partially observed latent process.The general objective is to estimate the missing solution and the parameters of the dynamic system proposed in the equation ( 2).The representation in the form of state space is given by: Where f 2 is a function, that maps the states within the same state space, g 2 is a function, that maps the state variables within the observed variables, and v 0 is a variable random with probability distribution given by p(.).
The joint probability distribution of the parameters Θ, the latent process v 1:t = (v 1 , . . ., v t ), and the observed solutions u 1:t = (u 1 , . . ., u t ) is given by: The latent process is not observed, and the interest is to estimate the marginal posterior distribution: Where: And: In general, the latent process can not be integrated, so MCMC and SMC algorithms are required to explore the space in parameter set Θ, and latent process, v 1:t .The ideal would be to integrate the latent process by calculating the marginal verisimilitude given in the equation ( 19), then the MCMC algorithm could be run only in the parameter space.
In particular, the problem that is proposed to be solved in this work, consists in the adjustment of two stochastic models to project the dynamics that population growth follows:

A. Langivin model of second order for stochastic growth
In this model, the problem of systems of second-order hypoeliptical equations discussed in [32] and that have a physical meaning where u is considered to be the position and v is the momentum, and the general form of Langevin's second order equation given by: dv = udt (21) Where f is a force function (possibly nonlinear) parameterized by D, dB is a Brownian Movement, and the variables v and u are scalars, while the parameters γ, D, σ and the states must be estimated.In this paper is considered a particular case of this model known as a stochastic growth model for which In this case, the process has a single diffusion parameter, σ, which describes the size of the fluctuations, and which must be estimated in conjunction with the unknown solutions states, v t .Expressing the model (24) in terms of the equation ( 2), is obtained that: Then, the stochastic growth model in its discretized form uses a order first approximation of Euler-type with a step ∆t for the equation ( 23), and for the second equation ( 24) the usual approximation of a stochastic integral is used, obtaining the following model: The parameters and states to be estimated σ t and v 1:T = (v 1 , . . ., v T ), respectively.

B. Differential equations system
Following [6], a system of differential equations is considered to model the stochastic growth of a population, but including environmental variability.
This system of differential equations is given: Discretizing the model using the first-order Euler approximation, is obtained: Where the parameters and states to be estimated are β 1 , β 2 , α 1 , α 2 and y 1:T = (y 1 , . . ., y T ), respectively.

Methodology
In this paper, is proposed to use three computational algorithms based on the MCMC methods and the SMC methods, to estimate the projection of the growth dynamics of a population.The algorithms proposed are:
Suppose is wanted to design an MH algorithm, whose stationary distribution is p (v 1:t , Θ|u 1:t ), where v 1:t are latencies unknown and u 1:t are the observations.Then in iteration l + 1, is proposed a candidate ) and accept this with probability: Assuming that the following distribution can be used as the proposed distribution: Then, the probability of acceptance is simplified as follows: However, two difficulties arise here.To approximate the equations ( 29) and (30), both the conditional distribution or filtered distribution p(v 1:t |u 1:t , Θ, κ), and the marginal distribution p(u 1:t |Θ, γ.σ), but both distributions are generally unknown, and in turn are required for can calculate probability of acceptance α.
The algorithm is summarized below: Algorithm 1: Metropolis-Hastings algorithm And • Step 2. For a time: l = 1 until N , generate the candidate values: And The current parameters v c 1:t and Θ c are changed to v (l−1) 1:t and Θ (l−1) , respectively, with probability of acceptance α, and in another case the string remains in the current parameters.
It increases from l to l + 1, and is returned 2 step by step.

Sequential Monte Carlo Algorithm
The SMC procedures are a class of algorithms, that are used to sequentially densities approximate of the posterior densities {p(v 1:t |u 1:t ); n ≥ 1}, as well as marginal likelihood sequences {p θ (u 1:t |u 1:t ); n ≥ 1} for θ ∈ Θ given.The SMC algorithm generates a set of K particles {v J } K l=1 , to approximate the conditional distribution p(v 1:t |u 1:t ), using an empirical measure: Where J is called weight of importance associated with the particles v (l) 1:t , and δ is the delta function of Dirac.The SMC algorithm is summarized below: -Compute and normalize the weights: And • Step 2. Time j = 2, . . ., J : -Sample K iid variables v (l) 0:j−1 according to the distribution: -Then, for each particle l = 1, . . ., K, sample: (i.e propagate the particle), and set v (l) j ) -Finally compute and normalize the weights: The simulation of one trajectory v 1:J = (v 1 , . . ., v J ) called run of SMC algorithm under the approximation of p(v 1:t |u 1:t , Θ, γ, σ), it is directly achieved by randomly one particle v (l) among the K particles with weights {W (l) j } K l=1 .Basides, the marginal distribution p(u 1:t |Θ, γ, σ) can be estimated through the weights:

Particle Markov Chain Monte Carlo Algorithm
The PMCMC algorithms are exact approximations of a posteriori distributions p(v 1:t |u 1:t ).These algorithms combine the SMC algorithms with the MCMC methods to approximate both p(v 1:t |u 1:t ) or p(Θ, v 1:t |u 1:t ), in the sense that for any fixed number of particles N ≥ 1 transition densities leave invariant target density.The SMC algorithm is an approximate simulation procedure for the target density p(v 1:t |u 1:t ), and the latter is used as the proposed distribution for the MH algorithm.
The PMCMC methods are used to simultaneously estimate the parameters and states in non-linear spatial state models.This procedure can be interpreted as a standard update of the MH method, which leads to convergent algorithms.
The key to this PMCMC algorithm is to approximate the target density p(v 1:t |u 1:t ), by choosing an optimal proposed density q(v 1:t |u 1:t ) = p(v 1:t |u 1:t ) and implementing a p(v 1:t |u 1:t ) within a step of the MH algorithm, which leads to a familiar and simple method.Previously, an SMC simulation procedure is run to approximate p(v 1:t |u 1:t ) but this procedure can not be implemented directly, since when calculating the acceptance rate of the MCMC algorithm, this it is required evaluating the marginal density, that is generated from an SMC algorithm.
Consider v * 1:t a candidate sample that updates the state v 1:t , with probability of acceptance given by: The optimal choice of q(v 1:t |u 1:t ) is: However, in practice this choice becomes impossible.The developments obtained for the SMC methods ( [7]), suggest the idea of using the approximation of p(v 1:t |u 1:t ) as the proposed density, this means, that a sample is extracted as follows: Where δ is the Dirac delta function.Given the samples and weights

}
, for calculating the probability of acceptance is required the expression of the marginal distribution of v * 1:t , which is difficult to estimate, however the SMC algorithm allows is obtained an unbiased estimator of marginal likelihood: This is a low cost algorithm because it requires very little input information, and provides satisfactory results, is needed to specify densities of importance in one dimension: For N ≥ 2, but when T is too large the algorithm has degeneration problems due to successive resampling steps, that decrease the number of different values for v * t .
ranges from year 1960 up to 2035.These results are contrasted with the estimates obtained by the World Bank models, which are generally based on national censuses of the population.The estimates for the years before and after the censuses are interpolations or extrapolations based on demographic models.The data is used property of the World Bank and is available on the website: https://data.worldbank.org/.To compare the estimates of the models, the percentage error (EP), which is a measure of the relative error expressed in percentage, with the estimated states ŷ and the true values observed y i .The percentage of the error can be used to compare each error in terms of 100%.The EP is defined as:

Langiven stochastic growth model
In order to adjust Langiven's stochastic growth model, the data from the population growth of male and female in the Republic of Ecuador were considered.In the by the algorithms proposed in this work: MH, SMC and PMMH, together with the projections made by the World Bank model.In these graphs and tables (2) and (3), it can be observed that the algorithms proposed have a good performance in the prediction of the population up to the year 2035, taking as reference the model of the World Bank but from there onwards one can observe instability in the projections.The estimates provided by the World Bank come from demographic models and, therefore, are susceptible to biases and errors due to deficiencies in both the model and the data.For various reasons, a stochastic model is preferable to a deterministic one to simulate the growth of real populations, since there may be an implicit demographic randomness because the population is composed of a finite number of individuals, that are subject to random events whose effects may be relevant in small populations due to the risk of extinction.On the other hand, there is always a randomness due to sampling since populations are estimated by random sampling subject to measurement errors.In addition, there is an environmental randomness that is given by external factors such as climate, epidemics that can affect populations, and these factors have a random behavior.Consequently, stochastic models allow to better understand and investigate the influence of noise on the dynamics of population growth and make more reliable predictions because the modeling structure is the most appropriate.
In the Figures (3) and ( 4) show the percentage errors for the estimated values of the population of female and male in ages between 35 and 39 from 2015 up to 2035, obtained using the algorithms MH, SMC, and PMMH, with respect to the World Bank model.The results indicate that the percentage errors for the estimated states are much lower with respect to the values provided from the model of the World Bank model, with the exception of the results obtained using the PMMH algorithm, that show a very high percentage error from the year 2030, which implies a divergence.These estimated values validate the results shown in the Figures (1) and ( 2) and in the tables (2) and (3), observing good prediction regarding the population growth of female and male up to the year 2035, which implies an increase in the human population, either by immigration or by the increase in the birth rate with respect to the mortality rate.All this can affect natural resources and social infrastructure, affecting the government's commitment to maintain services and infrastructure.
In the Table (4) the posterior estimate of the σ parameter of the population growth model is shown, for the three algorithms proposed up to the year 2035.The results indicate, a close stability in the estimated value of the

Stochastic model of population growth with variability in the environment
To adjust the stochastic model of population growth that experiences variability in the environment, data were considered that stem from the population growth of the Republic of Ecuador, both female and male.The Table (5) shows the a priori distributions that were considered for the adjustment of the stochastic model.In the Figures (5)  of Ecuador) are shown by the algorithms MH, SMC, and PMMH, together with the projections made by the World Bank models, and as commented above, these estimates come from demographic models and, therefore, are susceptible to biases and errors due to deficiencies in both the model and the data.In these Figures and in the Tables (6) and (7) it can be seen that the algorithms proposed have a good performance in the prediction of the population up to the year 2035, taking as reference the model of the World Bank.The Figures (7) and (8) show the percentage errors for the estimated values of the population of female and male between 35 and 39 years of age.The Republic of Ecuador from 2015 up to 2035, estimates obtained with the algorithms MH, SMC and PMMH with respect to the World Bank model.These results show that the percentage errors for the estimated states using the MH, SMC and PMMH algorithms are lower with respect to the values provided by the World Bank model.These estimated values validate the results shown in the Figures (5) and ( 6) and in the Tables ( 6) and ( 7), observing good prediction regarding the population growth of women and men up to the year 2035, which corroborates an increase in the human population, which as mentioned above may be due to immigration or an increase in the birth rate with respect to the mortality rate, which would affect the use of natural resources and the development of social infrastructure and in consequence affects government commitment for maintaining these services and this infrastructure properly.
In the Tables (8) and ( 9) the posterior estimation of the parameters β 1 , β 2 , α 1 and α 2 of the stochastic population growth model for the female and male data respectively, using the three proposed algorithms for the year 2035.In these results a stability is observed in the values of the parameters for each algorithm, that is closed to the data, in addition a common convergence to the value of the parameters with respect to the convergence of each algorithm is observed.

Discussions and Conclusions
The stochastic models used to study population dynamics through stochastic differential equations are well known in the field of statistics.However, in the current literature there are not many efficient and reasonable estimation methods from the computational point of view, that allow dealing with these complex non-linear structures that in most cases do not have closed expressions for the approximation of the likelihood.In this paper two stochastic growth models were proposed, a first model determined by the second-order stochastic differential equation of Lagiven, and a second model determined by a system of stochastic differential equations that models the environmental variability of a population.In addition, three computational algorithms were provided: MH, SMC and PMCMC to estimate the solutions states and parameters.By combining the SMC and MCMC methods, the problem of large dimensions is divided into many low-dimensional problems, that simplify calculations without sacrificing the properties of the target distribution.The adjustment of the models was carried out using real data from the Republic of Ecuador, and the results obtained predict a population growth from the year 2018 to the year 2035.These results were compared with those estimated by the deterministic models of the World Bank, indicating that with the models proposed in this work under the Bayesian methodology, the estimated data more appropriately fits the real population growth.Is estimated later the parameters of the models, and it is shown that the proposal adequately and reliably models the dynamics of demographic randomness, sampling errors and environmental randomness better than with the deterministic models used by the World Bank.To measure the relative success of the estimation algorithms, the relative error expressed as a percentage was proposed as a measure of goodness of fit, obtaining small percentage errors for the proposed models.Know a priori the population growth of a country can prevent the social and economic impact; in addition, the results obtained by these models can serve as a reference for decision making of government agencies, that are responsible for planning urban development policies, infrastructure, health services, education, drinking water, and growth harmonized with the environment, among others.

Figure 6 .
Figure 6.Algorithm: MH , SMC and PMMH for Male Estimated BM MH SMC PMMH Years 545494 536611 572027 572027 2015 576000 566000 603356 603356 2018 598000 587000 625742 625742 2020 659000 648000 690768 690768 2025 697000 690000 735540 735540 2030 724000 720000 767520 767521 2035 y t represents the size of the population, b t represents the birth rate per capita, d t represents the death rate per capita, b e represents the average per capita birth rate, d e represents the average death rate per capita, W i (t), i = 1, 2, 3, are independent standard Wiener processes.The parameters are β 1 , β 2 , α 1 and α 2 , but if environmental variability does not occur, then the value of the parameters is considered as

Table 1 .
(1)le(1), the a priori distributions for the adjustment of the stochastic population growth model are indicated.In the Figures(1)and (2), the estimates of the states (growth of the considered population of Ecuador) are shown A priori values to initialize the parameters of the MH, SMC, and PMMH algorithms, for the female and male data.

Table 2 .
Estimated values of the population of male from 35 to 39 years old of the Republic of Ecuador by the World Bank (BM) and the algorithms MH, SMC and PMMH Stat., Optim.Inf.Comput.Vol.7, June 2019 S. INFANTE, L. S ÁNCHEZ AND A. HERN ÁNDEZ 323 Figure 3. Percentage error for the algorithms: MH, SMC and PMMH for FemaleFigure 4. Percentage error for the algorithms: MH, SMC and PMMH for Male

Table 3 .
Estimated values of the population of female from 35 to 39 years old of the Republic of Ecuador by the World Bank (WB) and the algorithms MH, SMC and PMMH parameter σ for each algorithms with the data of the female and male sex, but no common convergence to the value of the parameter with respect to the convergence of each algorithm.

Table 5 .
and (6), the estimates of the states (growth of the considered population of the Republic A priori values to initialize the parameters of the MH, SMC and PMMH algorithms, for the female and male data

Table 6 .
Estimated values of the population of male from 35 to 39 years old of the Republic of Ecuador by the World Bank (BM) and the algorithms MH, SMC and PMMH STOCHASTIC MODELS TO ESTIMATE POPULATION DYNAMICS Figure 7. Percentage error for the algorithms: MH, SMC and PMMH for FemaleFigure 8. Percentage error for the algorithms: MH, SMC and PMMH for Male

Table 7 .
Estimated of the population of female from 35 to 39 years old of the Republic of Ecuador by the World Bank (WB) and the algorithms MH, SMC and PMMH

Table 8 .
Estimated values a posteriori of the parameters by means of the algorithms MH, SMC and PMMH, for the male data

Table 9 .
Estimated values a posteriori of the parameters using the MH, SMC and PMMH algorithms, for female data