Bayesian Unit Root Test for AR(1) Model with Trend Approximated by Linear Spline Function

The objective of present study is to develop a time series model for handling the non-linear trend process using a spline function. Spline function is a piecewise polynomial segment concerning the time component. The main advantage of spline function is the approximation, non linear time trend, but linear time trend between the consecutive join points. A unit root hypothesis is projected to test the non stationarity due to presence of unit root in the proposed model. In the autoregressive model with linear trend, the time trend vanishes under the unit root case. However, when non-linear trend is present and approximated by the linear spline function, through the trend component is absent under the unit root case, but the intercept term makes a shift with r knots. For decision making under the Bayesian perspective, the posterior odds ratio is used for hypothesis testing problems. We have derived the posterior probability for the assumed hypotheses under appropriate prior information. A simulation study and an empirical application are presented to examine the performance of theoretical outcomes.


Introduction
The Box-Jenkins approach for analyzing and modelling time series emerged as one of the most acceptable field of data analytic in which anyone may integrate various information from the recorded data through a modelling or methodology approach. It assists in recognizing the order dependence within observations, linear dependence among a set of variables, and prediction for a better perspective. In order to achieve these features, the first and foremost objective of the researchers is to expose the key elements of time series such as linearity, stationarity, trending, co-integration, etc. in the process. Then, one can draw significant inferences about the data generation process through the best-fitted model under some estimation, testing, and model selection procedures. In a time series model, linear dependence occurs when series is nearly symmetric to a lagged version of itself, and disturbance coefficients are constant over time, i.e., series is stationary with its mean, variance and autocorrelation. Even in various applications, series contain non-linear dynamics in case of non-stationarity, strong asymmetric nature, irregular time, seasonal variation, etc. In time series, non-stationarity occurs due to various components such as time trend, presence of unit root, structural break, outliers, etc. If the time series has a unit root and becomes stationary after taking the first difference, it is termed as difference stationary or integrated of order one. In time series, the classical tests of unit root are discussed by [1], [9], and [8]. [2] considered a time series model with a polynomial trend where trend component did not disappear under the maintained hypothesis of unit root and developed a classical statistic for the presence of unit root. The classical tests are largely based on asymptotic justification and often lead to the low power of the test, particularly in finite samples. [3] and [4] demonstrated that Bayesian unit root tests based on flat prior assumptions perform better than classical methods. [5] used a Bayesian approach for testing the presence of unit root in various real exchange rate time series. [6] derived the posterior odds ratio for testing the unit root hypothesis under a vague prior assumption. [7] studied the unit root hypothesis for an autoregressive model with a polynomial trend under a Bayesian framework.
To model the non-linear trend components, one may require a polynomial of high order in which estimates of coefficients are usually unstable. An alternative to fit a polynomial of higher order is to approximate it by a spline function. Spline function is a piecewise polynomial segment that has been joined together at the knots in a fashion that ensures certain continuity properties. In other words, spline function fits a curve of low order between different join points, known as knots. A knot is a common point that occurs because there are changes in pattern behavior at different intervals. [10] expressed that splines are the smoothest possible piecewise polynomial which retain a segmented nature, whereas [15] called splines as lines or curves function which are usually required to be continuous and smooth. [11] used the model based on a spline function for predicting the number of deaths due to cancer in the USA. [12] determined the number and positions of knots in the regression splines model using the new Gibbs sampler algorithm, where the model expressed as linear mixed with random effect term. [14] studied the Bayesian approach for modeling the partial linear model with AR(1) error belongs to the scale mixture of normal (SMN) distributions family. [13] used the spline, Bayesian spline, and penalized spline regression methods to model the distribution graph of ratios of export to import for Turkey. [17] considered Buys-Ballot and classical methods of decomposing to estimate the cubic trend as well as other components of the times series and obtained the chain base and fixed base estimators with their statistical properties. [19] proposed a particle swarm optimization B-spline network to improve the prediction accuracy of non-linear time series. They adopted a forecasting error square sum to evaluate the training effect of the B-spline network. [18] presented a new cubic B-spline approximation method for solving second order singular boundary value problems with application in physiological sciences. [16] considered smoothing spline (SS) and penalized spline (PS) methods for estimating the unknown functions in a conditional heteroscedastic non-linear autoregressive (CHNLAR) model and concluded that SS method performed superior to the PS method.
In the present paper, the main emphasis is to build up a Bayesian approach for testing the unit root in an autoregressive time series (AR) model with a non-linear time trend. The non-linear time trend has been approximated by the linear spline function. For unit root testing, we obtain the posterior odds ratio for the considered hypotheses of the model under the appropriate prior assumptions. Due to the complex form of posterior probability, the Monte Carlo integration technique is used to achieve the results from the posterior odds ratio. The precision of linear spline function for the approximation of time trend AR model is justified in both simulation and empirical studies. In an empirical application, dataset on import series of Asian Regional Forum (ARF) countries are taken to demonstrate the applicability of linear spline function in the time series model.

Model specification
Let us consider a time series model with a non-linear trend component, which is approximated linearly by a spline function with r join points 1 < t 1 < t 2 < · · · < t r < T . Then, time series model with a spline function is given by, where, {y t ; t = 1, 2, . . . , T } is an observed series, δ 0 is the intercept term, δ is the trend coefficient, r is the number of knots contains location of knots t 1 , t 2 , · · · , t r , ψ i is the coefficient of i th knot and s i (t) is a spline function describe as a linear polynomial form defined as follow, Let u t is a stochastic error term follows the AR(1) process, where, ρ is the autoregressive coefficient, and ε t s are independent and identical distributed normal random variables with zero mean and unknown variance τ −1 .
Combining (1) with (2), we can write the model as, where The model defined in equation (3) is a linear spline autoregressive model of order one with a partially linear time trend. The main objective of this study is to test the hypothesis that the model involves unit root or not in the presence of trend component in the form of linear spline function. Here, the hypothesis under investigation is H 0 : ρ = 1 against the alternative H 1 : ρ ∈ S, S = {a < ρ < 1; a > −1}. Under the unit root hypothesis, model (3) reduces to In model (4), the trend component is absent under the unit root case, but the intercept term makes a shift with r knots. For writing the model (3) and (4) in matrix structure, we define the following notations, y = y 1 y 2 · · · y T , y −1 = y 0 y 1 · · · y T −1 , ∆y = ∆y 1 ∆y 2 · · · ∆y T , Here I is T × T identity matrix and L is T × T matrix with all (i + 1) th row and i th column equal to 1 and other elements equal to 0. Utilizing the given notations in equation (5), the models under unit root and the alternative hypothesis are represented as,

Posterior odds ratio
In statistics, the Bayesian approach formulates the inference through sample information, i.e., likelihood function with parameter prior information as a random variable, i.e., it makes use of our prior beliefs about the unknown parameters in consideration of available information. In time series, the Bayesian approach is frequently used to identify the best suitable model with different types of characteristics, testing the stationarity, estimating the unknown quantity, comparing unit root models and forecasting future values. The likelihood function under the unit root hypothesis H 0 : ρ = 1 is given by, Under the alternative hypothesis H 1 , the likelihood function is, From a strict Bayesian point of view, one cannot say about the prior function that the assumed prior is relevant and better than others. Selection of prior distributions is an arbitrary procedure and there is no way to explicit it. For this study, we assume the following prior distributions for the parameters of the proposed model from [5] and [7], The prior distribution for γ is, Stat., Optim. Inf. Comput. Vol. 8, June 2020 The joint prior distribution under the null hypothesis parameter {Θ 0 = (δ, ψ, τ )} and an alternative hypothesis parameter {Θ 1 = (ρ, γ, ψ, τ )} are given by, Combining the prior distributions with the sample likelihoods, posterior distributions under the null and alternative hypothesis are given by where K 0 and K 1 are the normalizing constants under the null and alternative hypothesis. Under the Bayesian perspective, the posterior odds ratio (POR) is used in decision making for the hypothesis testing problem/model selection procedure. POR (β 01 ) is the product of the prior odds ratio with Bayes factor of the null (H 0 ) and alternative (H 1 ) hypothesis. In other words, POR is the ratio of posterior probabilities for the observed series under the given hypothesis with the product of the ratio of prior probabilities under the null (p 0 ) and alternative (1 − p 0 ) hypothesis. The POR is expressed as, We get the posterior probabilities under H 0 and H 1 by integrating the equations (20) and (21) for model parameters.
The following notations are used to describe the posterior probabilities

Theorem 1
For testing the unit root against the alternative of trend stationarity, the posterior odds ratio with prior odds ratio p0 1−p0 is obtained as The detail derivation of equation (23) is given in the Appendix.

BAYESIAN UNIT ROOT TEST BY LINEAR SPLINE FUNCTION
The expression of POR in equation (23) involves integral that is not easy to obtain using general computational techniques. Hence, the Monte Carlo integration technique is considered to solve the integral approximately and evaluate the POR.

Simulation study
This section discusses the performance and consequence of the proposed model based on simulated series. In the simulation, we generate time series from the model (3) with a seed value of observed series y 0 = 100 under the assumption that the number and locations of knots are known. For each parameters setting, we replicate the simulated series for 1000 cycles and record the average results. We consider different sizes of time series T = (80, 120, 160, 200) with the number of spline knots r = 3 for a set of knot locations (T /4, T /2, 3T /4). For suitable explanation of simulation exercise, different sets of model parameters are considered as ρ = (0.85, 0.9, 0.95, 0.99, 1), β = (0.5, 1, 1.5, 2, 2.5) , φ = (0.5, 1, 5, 10, 15), ψ 1 = 20, ψ 2 = −50, ψ 3 = 15 and τ = 5. So, the distribution of t ∼ N (0, 0.5). The hyperparameters of the assumed priors are ϕ = 2, Ω = I 3×3 and ψ 0 = mean of s i (t) series at i = 1, 2 and 3. For computing the posterior odds ratio, the Monte Carlo integration technique is considered to approximate the integral involvement in posterior probability and replicate this technique for 5000 times to obtain the posterior samples. The value of a is obtained using the following approach of [5], where ρ has been estimated using the uniform distribution U (a, 1). In this illustration, we allocate equal prior probabilities to the null and alternative hypothesis which makes the prior odd ratio equal to one. The results of the POR and rejection rate (in parenthesis) of the null hypothesis are given in Tables 1-5 for various sets of parameter values as well as different sizes of the series. Since, there is a possibility that simulated series having outlier at their assumed knots. Hence, we display histogram and box-plot for POR values that define the outlier effect in the simulated POR which is given in Appendix Figures 3-20.
From Tables 1-5, we observe that as the size of the time series is small (T ≤ 120), the value of POR is more than one for some cases of ρ ≤ 0.9 and β = 0.5, i.e., unit root hypothesis is wrongly accepted for the simulated series. It happens because linear spline function partitions the series in a small linear process with an approximated time trend. However, the POR value is less than one for moderately large size of the series for all ρ < 1 and the high values of β and φ For ρ = 1, the unit root hypothesis is accepted in all the cases. The value of POR is always greater than one for all parametric settings and sample sizes leading to the acceptance of unit root hypothesis. We also notice that POR value is continuously declining as the values of β, φ and T increase as long as ρ < 1.

Real data analysis
This section provides a real application to justify the theoretical outcomes. In the real applications, linear trend may not provide appropriate model and some non-linear trend may provide a better fit. Such a non-linear trend may be approximated by piecewise polynomials under the spline function. For analysis purposes, we consider monthly import series (Billion US Dollar) of 14 selected ASEAN Regional Forum (ARF) countries from the period April, 2012 to August, 2018. This data set is taken from the International Monetary Fund and International Financial Statistics data portal. First of all, we apply Ljung-Box test to check the residuals are independent in each time series where null hypothesis considers residuals are independent. Then, model selection criterion's are considered to determine the appropriate number and locations of knots in the import series. Here, we assume that the maximum number of knots present in a particular time series is r = 3. Tables 6-7 record p-value of Ljung-Box test, the number of knots (r) and their locations with respect to the minimum values of Akaike information criteria (AIC) and Bayesian information criteria (BIC) for different import series.
From Tables 6-7, we observe that out of 14 ARF countries, 11 countries do not reject the null hypothesis at 5% level of significance whereas remaining 3 countries (Taiwan, Indonesia, Thailand) do not reject the null hypothesis at 1% level of significance. This concludes that residuals are independent in each series which we want for the model to be correct. Tables 6-7 also show that each country contains a particular number of joint point(s) at specific locations in the real series, which varies based on information criteria. On the basis of minimum values of AIC and BIC, we observe that three knots are present in time trend for Japan, India, and Thailand, whereas two knots are present in time trend for Australia, South Korea, Taiwan, Indonesia, and Malaysia. For remaining countries, i.e., Hong Kong, New Zealand, China, Philippines, Singapore, and Viet Nam contain one knot. Figures 1-2 display the time series plot of each country with piecewise linear segments using the identified knots described in Tables 6-7.
Once we get the joint point(s) of the spline function, the posterior odds ratio is obtained to test the presence of unit root using the derived theorem and records the estimated value of the autoregressive coefficient also. Tables 8-9 present the POR values along with estimated values of ρ for the developed and developing ARF countries, respectively. In Tables 8-9, we observe that the POR value is less than one leading to the rejection of the null hypothesis of unit root. It concludes that the import series of all ARF countries don't have the unit root when the non-linear trend components is approximated by linear spline function in the autoregressive model.

Conclusion
In this paper, we propose the Bayesian unit root test for testing the presence of unit root in an autoregressive model with a non-linear time trend. The non-linear time trend includes a trend component approximated by a linear spline function. The linear spline function contains r joint pieces to allow discontinuity at t i time point for the i th polynomial. For testing purpose, the posterior odds ratio is derived for the assumed hypotheses under suitable prior distributions. In the simulation study, the posterior odds ratio gives appropriate justification for the proposed model for various combinations of model parameters. An empirical analysis of the import series for selected ARF countries is carried out to illustrate the performance of the proposed model. The number and locations of knots are identified using the two information criteria, AIC and BIC. It states that the import series of all countries reject the unit root hypothesis with proper selection of knots.