Development of Oil Production Forecasting Method based on Deep Learning

Identiﬁcation of the quick declines of the desirable production ﬂuids and rapid increases of the undesirable ﬂuids are the production problems of oil wells. The main purpose of this work is to develop a method that can forecast oil production with high accuracy, using Deep neural networks based on the debt data of wells. In this paper, a hybrid model based on a combination of the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) networks, called CNN-LSTM is proposed for the forecasting of oil production time series. The architecture of the proposed CNN-LSTM model is hierarchical. Here, at ﬁrst the CNN layer of the model is applied to the current time window, then the relationship between the time windows is predicted by applying the LSTM. The challenges of time series prediction often come from the continuity duration of every state. In order to overcome this problem, we try to predict temporal dependency in the certain time window. This issue is solved by the application of the CNN algorithm. Evaluation efﬁciency of the proposed model is performed on the QRI dataset. The prediction accuracy of the method is tested by RMSLE loss function and the best results are obtained using our proposed in the testing process.


Introduction
The existing oil production industry is built on outdated technologies. Such systems can lead to decreased debt of oil production wells, increasing costs of the oil production process and so on. Under these conditions, an effective development of oil fields can increase the oil production and the oil transfer capability of reservoirs, extend the life cycle of oil fields, and have a great importance on economic efficiency.
Identification of the quick declines of the desirable production fluids, and rapid increases in the undesirable fluids are the production problems of the oil wells.
The application of Deep neural networks in the solution of such urgent issues can lead to successful results. Application of these methods to the oil production control and management field can result in high efficiency in various issues, such as the prevention of the inefficient use of energy, optimization of oil extraction time, control of equipment condition, collection, storage and processing of current and history data. Recently, there has been a revival in the application of Deep neural networks in the oil industry by the world's leading oil companies. Chevron is one of the several companies that point out that Deep learning is the most appropriate technology for reservoir characterization. In 2016 researchers at Chevron led an effort to employ Deep neural networks and fuzzy kriging to analyse the viability of a reservoir in Californias San Joaquin Valley [1] .

827
From another point of a view, researchers from the SAS Institute have found that merging traditional seismic analysis methods and Deep learning methods has been a more effective tool in efficient discovery of resources in an upstream.
In [2] a supervised and unsupervised approach is proposed to characterize the reservoir on the seismic profiles. Here the seismic images are used as the input data of the proposed Deep Learning method. During the preprocessing phase, patches are created from larger image sets that can reduce the number of features required to represent an image and can decrease the training time needed by algorithms to learn from the images.
In recent years, as Deep learning methods have gained great popularity in the various research areas as a new approach, the researchers have started using CNNs in their work to recognize high-level samples from multidimensional time series data. CNNs have two key features: weights sharing and spatial pooling. This feature has turned the CNN method into a very useful tool for computer vision applications. In this type of issues, the input data of CNN is 2 dimensional (2D) data. However, the CNN model is also applied to natural language processing and speech recognition issues. In this case, the input data of the CNN is 1 dimensional (1D) data [3,4]. CNN with one layer performs the feature extraction function from the input signals by applyying the convolution operation. Usually, 1D kernel function is used when the input contains one-dimensional time series data. In other words, the convolution operation is applied separately to each size of the data [5]. In addition, some hybrid models are proposed as a result of the combination of CNN and LSTM models for forecasting and classifying time series by learning complicated features [6].
Recently an researchers have conducted extensive investigation in the application of Deep learning methods to the oil production forecasting field. In [7], the method for the forecasting of the oil production time series based on multilayer neural networks with Multi-Valued Neurons (MLMVN) is proposed. In this work, the forecasting capability of the MLMVN model for the prediction of the dynamics of the reservoir is demonstrated. A dataset consisting of monthly production data from 14 wells of oil fields located on the shores of the Gulf of Mexico is used to test the proposed forecasting model. In [8], the application of high order neural networks (HONN) to water, oil and gas production forecasting is considered. In [9] natural gas prediction method based on neural networks is proposed. In [10] the application of the multiple artificial neural networks (MNN) to the evaluation of the future production performance of oil wells based on the time series of monthly production data is considered. MNN is a group of the single artificial neural networks (ANN) which cooperate with each other to solve a specific problem. Each MNN makes predictions in the different time periods. The results obtained here show that the MNN model has shown better results in making long-term predictions compared to the single ANN model. In this work, the data for the experiments are taken from the Saskatchewan Energy and Field center. The activation function of the neural network is sigmoid. Since the value of the sigmoid function changes in the interval [0, 1], the data here is normalized using the min-max algorithm. Deep learning methods such as CNN, LSTM, DBN and others are also applied in the oil production forecasting field.
Since CNN is designated for short-term forecasting, it cannot model the long-term time series sequences. This is related to inability to record time series dynamics efficiently by CNN models that are limited by convolution layer. To resolve this problem the LSTM networks with memory cells are used by researchers. LSTM is using the concept of gates in the memory cells. This function allows the network to learn what hidden states it needs to forget and what states it needs to refresh [11]. LSTM can learn long-term temporal dynamics of the data in the form of sequences and accepts the unprocessed data as an input data. When applying LSTM to the features extracted from unprocessed data by the other Deep learning methods, it allows achieving higher efficiency in the forecasting [12].
In some approaches, LSTM acts as a predictive block of the proposed models [13]. In [14] combination of the conventional and recurrent layers is applied to the time series classification task and high results are achieved.
To ensure accurate forecasting of oil production in oil wells, it would be expedient to use the combination [15] of the above mentioned models.
In this paper, a hybrid model based on a combination of the CNN and LSTM networks is proposed for the forecasting of oil production time series. Here, at first the CNN layer of the model is applied to the current time window and by this, the features are extracted, and then the relationship between the time windows is predicted by applying the LSTM.

DEVELOPMENT OF OIL PRODUCTION FORECASTING METHOD
In the proposed model, the objective of the CNN layer is to extract features, and LSTM layer is to perform prediction. The main contributions of this work are: 1. The effectiveness of the Deep neural networks in the oil production forecasting is investigated; 2. The new architecture containing an improved Deep CNN and LSTM blocks is proposed for efficient forecasting of the oil production in the wells; 3. Proposed model achieved high results in the forecasting process.
This paper consists of the following sections: Section 2 summarizes some of the methods used in the oil production time series prediction. In section 3 an architecture of the proposed CNN-LSTM model is provided. In section 4 dataset description is provided. In section 5 the results of the comparative analysis of the proposed method with existing methods is described. In section 6 conclusion of this work is provided.
Each layer of the deep network learns independently, bypassing previous pre-training procedure. It then allows checking a good initial approach to run backpropagation algorithm. Depending on the selected model, each layer may be RBM or CNN (Convolutional Neural Network) [20].
Boltzmann Machine (BM) is a network of symmetrically connected stochastic binary units. The units are divided into two groups, describing visible and hidden states (analogy with hidden Markov models). The states of the visible and the hidden neurons vary according to the probability activation functions.
Restricted BM (RBM) is a BM that has no relation between the hidden layer neurons. Due to the special bipartite graph structure, it is possible to clearly find the probabilities of the hidden layers neurons. If a sufficient number of neurons is used in the hidden layer, RBM can generate any discrete distribution.
RBM is a key structural unit for constructing the Deep Belief Network (DBN). DBN is a multilayer network [21], in which the lower layers are sigmoid Belief network, and the upper layer is RBM.
Deep BM (DBM) is sometimes used in the pre-training step instead of the autoencoder. Multilayer architecture of the DBM is the main difference from RBM.
CNN is a multilayered neural network with a special architecture to detect complex features in data. CNNs have been used in image recognition, powering vision in robots, and for self-driving vehicles.
LSTM recurrent neural networks are capable of learning and remembering over long sequences of inputs. LSTMs work very well if the problem has one output for every input, like time series forecasting. But LSTMs can be challenging to use when the problem has very long input sequences and only one or a lot of outputs.
Hybrid DL architecture integrates the generative and discriminative architectures. Deep Neural Network (DNN) can be given as an example of hybrid architecture. In [22] DNN is a cascade of fully-connected hidden layers and often uses the RBM stack as a pre-training stage.
The main purpose of this work is to develop a method that can predict the oil production with high accuracy using the Deep neural networks based on the debt data of wells.
The oil and gas supply chain consists of three streams: upstream, which covers the exploration, development, and production of oil and gas; midstream, which includes the transportation of oil and gas by tanker; and downstream, which includes the refinement and sales processes. This paper researches the upstream level of the oil industry. The oil production processes are modeled on the basis of hydrodynamic numerical evaluation of the processes in the reservoir, a dataset containing history data on the creation of the oil fields, equipment characteristics, the time-varying geological characteristics of the reservoir, well operation modes, well operation and break time.
The data required to predict oil production are divided into the following groups: 1. Time and periodicity of information recording. Determined by the recording time of the measurements. Characteristics of injection wells. This group includes the load volumes, the acceleration, the well operation time, the coordinates and numbers of the wells and so on. 3. Characteristics of the production well. This group includes the volume of produced water and oil, liquids separation, debt of the wells by oil and gas, total debt, total production, operating time, coordinates of wells, number of production wells and so on.
Depending on the input data a number of Deep learning architectures are proposed. One of the special research directions of the oil and gas industry is the reservoir characterization. In [23] Deep neural networks are used to predict the properties of the oil reservoir, such as porosity, permeability, pressure-volume-temperature (PVT), depth, drive mechanism, structure and seal, diagenesis, well spacing and well-bore stability. Some of these properties of the reservoir are used to detect drilling problems, to determine reservoir quality, to optimize reservoir architecture, to the identification of lithofacies, and to measure reservoir volume. Here varies methods that perform petroleum reservoir characterization, based on the hybridization of different algorithms with neural networks are also reviewed. In [24] reservoir characterization issue based on neural networks is taken into consideration.
In [25] by applying kernel method to Arps decline model, -a new nonlinear multidimensional forecasting model, titled as the nonlinear extension of Arps decline model (NEA) is proposed. The base structure of the NEA model is the Arps exponential decline equation and nonlinear combinations of time series in the input are created by applying the kernel method to the model. It can effectively determine the nonlinear relation between the input time series and the oil production. In order to evaluate the effectiveness of the NEA model, in this work, the experiments are conducted on the data taken from oil fields in China and India. In order to improve the ability of the model, the combination of the decline curve methods with intellectual methods is provided.
In [26] oil well production model based on the MLP method is proposed using production data. In [27,13] LSTM type recurrent neural network is used in the recognition of top-level templates and value forecasting issues to study the temporal and sequential features of the time series. While the above-mentioned methods provide good results in the recognition of templates, these methods encounter great difficulties in recognition of the temporal features as a sequence.

Proposed method
In recent years, the combination of CNN and LSTM layers has gained more attention [28,29,30]. Two types of combinations exist. The first group combines by applying separate layers of convolutions or LSTMs one after another. The second group of combinations includes convolution into LSTMs or general RNNs.
In this section, we introduce a CNN+LSTM Deep network for time series prediction. An architecture of the proposed CNN+LSTM Deep Learning model is described in Figure 1. Our model has two major components: the CNN layer and the LSTM layer. These layers are stacked from bottom to top, and statement of these layers individually refers to capture features from the sensor sequence in sliding windows and from the sequence of states.
The algorithm of the proposed CNN-LSTM model for the prediction oil production properties is as follows: Step 1. The oil well characteristics determination. For a time series problem, here the observation from the last time step (t-1) is used as the input (a sequence of history values) and the observation at the current time step (t) is used as the output (the value on next timestamp).
Step 2. Building training samples based on CNN. In our case, 48 samples are used for the neural network training and 12 samples are used for forecasting. Note that, there is no minimum or maximum for training and testing sample size. In the proposed model the number of samples can be taken in any size. Generally, higher number of samples for training can ensure better performance.
Step 3. Building the network of the proposed hybrid Deep Learning architecture. Input the oil production debt data to the constructed network and train the neural network based on these data.
Step 4. After learning the neural network, implement the testing phase and find the required solution.
Step 5. Calculation of the loss caused by prediction. Here, the time series are used to perform the forecasting. A time series is a sequence of real-valued data points with timestamps generated by D different sensor channels. The raw data x ti at any timestamp i is a multidimensional vector that can be described as a tuple vector of measurements. The challenges of time series prediction often come from the continuity in a time of every state. In order to overcome this problem, we try to predict temporal dependency in the certain time window. This issue is solved by the application of the CNN algorithm.

Convolutional neural networks
Filtering time series data is an important tool to improve prediction performance. CNN allows the automatic creation of good filters. Although the success of CNN emerged from the vision domain [31], they also demonstrated potential in time series applications, for example, in activity classification [32]. One main difference from the most classical filters is that convolutional filters are multivariate and thus combine inputs. We define the 1D convolution operator at some time step t as: where k is a kernel vector that depicts the filter. Successively applied operations lengthen the amount of time in a filter and lead to potentially more meaningful, but abstract features. A CNN then consists of multiple conv layers and a dense layer at the end. The number of filters or neurons of the conv layer corresponds to the number of output channels. All filters are applied to the input series X δ t = ( x t−(δ+1) , ..., x t ) with time frame length δ at the time t: where s is a stride, skips intermediate steps. This layer uses less trainable weights than a dense layer who would learn on X δ t instead of just x t , as the kernel applies to all time steps. In the proposed architecture the structures of the individual convolutional subnet for the input p i at different time are the same. Assume that input p i = {x t1 , x t2 , ..., x t l } is a L 0 × D 0 tensor, L 0 equals to the sequence length of sliding window l, where D 0 equals to the number of sensor channels D. For each time interval l, the matrix p i will be put in into a CNN architecture. To learn temporal dynamics from the input p i the 1D filters with shape (k, 1) is applied. In this paper, the size of filters in every convolutional layer is the same, and the convolution is only computed where the input and the filter fully overlap. For each convolutional layer, the model learns f filters, through which the model got more nonlinear functions and learned more global information of the current sequence, and use ReLU as the activity function. The convolutional layer is not followed by a pooling operation, as the next LSTM layer requires a data sequence to process. The shape of feature maps output by the m convolutional layers is Here, we flattened the output matrix of layers into vector V i , which is considered as the feature representation of a high-level pattern.

LSTM networks
Unlike dense layers, LSTM utilizes the temporal information of a series by sequentially processing the data. This differs from conv layers because its kernel weights are trained to find features throughout the time series. RNNs have a memory for previous inputs and outputs because they introduce the previous output with a hidden weight back to themselves. In other words, the input series X δ t is sequentially processed vector for vector in increasing amount of time steps δ. LSTM is one of the types of RNN models. In RNN with an increasing amount of time steps δ the gradients begin to vanish exponentially, which impedes the learning process. Unlike RNN in LSTM the applied gating mechanisms prevent the loss of far-away information. LSTM is excellent at time series prediction tasks [33]. In an LSTM layer, the hidden weights h get adjusted with every time step ∨ t ∈ (t − (δ + 1) , ..., t) by taking the elementwise product (•) of a output gate o and the activation of a cell c. This cell determines how much of the previous cell is retained with a forget gate f and adds it to the product of an input gate i and an input modulation j. All gates consider the hidden weight from the preceding time step. We give a formal representation as follows: with non-linear activation functions φ (tanh, sigmoid, relu), weight matrices W , and bias vectors b, whereas the relations to the gates are visible in the subscripts. As a result of the operation of the convolutional layer, the sensor data p i = {x t1 , x t2 , ..., x t l } in a sliding window has been processed to a feature vector V i . Concatenating all n vectors {V 1 , V 2 , ..., V n } into a n-row matrix V , which is the input of the LSTM layer. Then the output of LSTM at every time step is passed into a relu output layer which yields the prediction outcome y i . So the input of this part of LSTM is V , the output is Y .

Dropout
Dropout established itself as new and reliable regularizer that helps to avoid overfitting [34]. Essentially, dropout deactivates a random part of the input for every neuron with probability p. Here, we define it formally as follows.
Every weight matrix W is now interpreted as a random matrix that contains the weight matrix ∧ W and initializes a different vector z for every neuron: This dropout vector z has its values randomly set to one instead of zero with a probability p. Further, the dropout vector changes for every training step. For LSTM layers not only the input should be partially dropped, but also the recurrent units. To that end, equation

Dataset description
The Deep Learning method proposed in this paper is tested on the dataset presented by QRI (Quantum Reservoir Impact). This dataset consists of data from seven drilling reservoirs. Field name, formation type, well name, oil production, water production, gas production is a feature of this dataset. Each of the seven reservoirs consists of several wells. Each well has a name. Data on oil, water and gas production from each well are recorded in each month. In this paper, only oil production is considered. Here, each well is divided into smaller chunks (parts). Each of these chunks contains 60 points (these points represent the number of months). Here 48 points of these months are input data and expected 12 months (the points to be predicted) are outputs of the proposed system.
To standardize the data of each chunk, they are normalized. In addition, according to the oil well index the separation of data into training, validation and test data is conducted. Where training data consist of 5041 chunks, validation data 1026 chunks, and the test data 1144 chunks.
Efficiency assessment metric of the model (RMSLE) In this paper for the evaluation accuracy of the proposed method, The Root Mean Squared Logarithmic Error (RMSLE) is used. RMSLE is to compare the predictive value with the true value, and is calculated as the square root of the squared bias plus squared standard error: where n is the total number of observations in the testing dataset, a i is predicted value, and b i is the actual value. RMSLE is a method to measure the error rate. According to this measure, a low RMSLE value indicates a better forecasting solution.

Experiments
The dataset is divided into three subsets, training, validation and testing. The training and testing data are consists of 5041 and 1026 monthly production taken from several wells. The validation data is consist of 1144 productions.
Here the training data is used to train the neural network. Validation data is used to determine the effectiveness of the neural network on samples, which are not used in the training process. Training and validation are done simultaneously. These two data are used to configure the neural network parameters. The test data is used to evaluate the overall effectiveness of the model after determination the neural network parameters values.
Comparative analysis of the proposed CNN+LSTM method with CNN and LSTM methods is conducted. Here the main purpose of the CNN+LSTM approach is the forecasting the monthly production in various periods starting from 1 month to 48 months.
As a result of this analysis, loss values of the LSTM, CNN and proposed hybrid CNN+LSTM models are calculated based on the RMSLE metric and added to Table 1.
As seen from Table 1, the results of the proposed hybrid CNN+LSTM model outperforms the results of other models. Thus, during prediction of the 12 points in the test data the RMSLE value of the CNN+LSTM model is obtained 0.186891, but in the CNN model this value reached to 0.187198 and in the LSTM model to 0.193466. RMSLE is a method to measure the degree of loss (error rate), so smaller RMSLE value indicates a more accurate model. As seen from here, in the assessment of oil productivity of the wells, the effectiveness of the CNN+LSTM method is better than other algorithms.
To better illustrate the results of Table 1, the loss dynamics of each model are depicted in Figure 2.
As seen from the figure 2, by training the CNN+LSTM model the loss value starting from the first iteration to the last step is slow down by changing very smooth, compared to other methods at the end it achieves the lowest loss value. However, the loss values of the CNN and LSTM models separately varies not smooth. In some cases, there is a situation where the loss value of the next iteration is bigger than the previous one. As the loss was less than other methods in proposed hybrid CNN+LSTM model during the forecasting, this model is considered to be more effective.
To evaluate the importance of methods, in the scientific papers statistical significance tests are used. For this purpose, the methods are launched 70 times on the QRI dataset ( Table 2).
As seen from Table 2, better results are achieved by the hybrid CNN+LSTM model. For the verification the robustness of the prediction methods, the boxplot representation is created based on values in Table 2 ( Figure 3).
As seen from Figure 3, the hybrid CNN+LSTM model has achieved better results than other methods. Here, because of the increase and decrease of values are changing with the small leap over every iteration, the boxplot representation of this method becomes tighter.
The proposed hybrid CNN+LSTM model has predicted the future oil production data for the next 12 months very accurately. Forecasting of the hybrid CNN+LSTM model on the data, taken from several chunks is depicted in Figure 4.  In Figure 4 the production dynamics of the oil well by time function is depicted. In general, oil production is changing for a variety of reasons. For example, well-stimulation operations cause changes in the wellbore area, which leads to an increase in production. The decrease in production is usually caused by reduced pressure in the reservoir or the degradation of the mechanical condition of the production well. An effective way to slow down the degradation is to carry out additional recovery operations, such as water flooding. Another way to restore the pressure is the shutdown the well for a certain period of time. If the production is down to zero during the closure of the well, then it usually begins to rise. But over time, the decline is repeated.
As seen from Figure 4, constructed neural network predicts downward slopes, flat lines and sudden upward jumps with high precision. Most approaches cannot accurately predict sudden upward jumps in time series [35]. As seen in Figure 4, the oil production rate is increased considerably over a certain period of time and it started to decline later.
In order to provide optimized training of the proposed multilayer neural network, the numerous experiments at various values of the batch size parameters are conducted. Batch size is the number of training data used in one iteration. Here, in high values of the batch size parameter, the effectiveness of the model falls (Table 3 and Figure Table 3 and Figure 5. In our work the experiments are conducted by the following parameters: a number of neurons 50, the batch size 20, the activation function relu, the optimization function sgd, the number of iterations 1000.

Conclusion
In this paper, we proposed a novel prediction model for the oil well production forecasting based on the hybridization of the CNN and LSTM models, which is called the CNN+LSTM model. The hybridization of these methods is improved forecasting capability of the above mentioned Deep Learning methods.
It has been shown in the case studies that the CNN outperforms the LSTM accuracy, which indicated that the LSTM is eligible to be applied in the nonlinear forecasting problems in the petroleum engineering.
The impact of the batch size parameter and the regularized parameter is also tested in this study. The results show that the smaller batch size makes the CNN+LSTM model be more accurate, and the larger batch size makes the CNN+LSTM unsuitable. And the CNN+LSTM model produced smoother points with a smaller batch size, and it was represented to be more accurate in training with smaller batch size. These results all indicated that in parameters such as the number of neurons 50, the batch size 20, the activation function relu, the optimization function sgd, the number of iterations 1000 the optimal solution would be obtained.
The limitations of this study should also be noticed that we are not proposed an available algorithm to compute the optimal parameters of the proposed CNN+LSTM method. In fact, selection of these parameters is an open problem in the Deep Learning researches, and there is no available algorithm to select the optimal parameters up to now. But it will be interesting and might be possible to find an optimal interval which contains the optimal parameters, and once this interval is found, it will be more computationally efficient to select the optimal parameters for the CNN+LSTM model.