Convolutional LSTM’s for Sea Temperature Forecasting Part 2

Summary: In this post, we extend the work from the Part 1 post by using training our hybrid recurrent convolutional neural network to forecast sea temperatures multiple days into the future.

Here we will extend the work done in our previous post on applying convolutional long short term memory (LSTM) networks for sea temperature forecasting. Previously, we built a neural network to read in sequences of daily mean sea temperature grids to predict temperatures a day ahead at each point in the sequence.

We will now apply a neural network for a slightly modified task using the same data and a similar neural network architecture. We will again use deeplearning4j (DL4J) to demonstrate how deep learning can be applied to the forecasting task. Because the material in this post contains many similarities with the previous material, we will focus on their differences. To avoid repetition, we will reference the previous post for concepts that are not new to this post.

Forecasting Task

Our aim here is to model and predict the average daily ocean temperature at locations around the globe similar to before. Instead of predicting the average daily temperature at a day ahead in each point of a given sequence, we strive for a more realistic aim of predicting a sequence of temperatures following a given sequence.

Like before, we define a two-dimensional (2-D) 13-by-4 grid over a regional sea, such as the Bengal Sea, yielding 52 grid cells. At each grid location, we observe a sequence of 50 daily mean ocean temperatures. Our task is to read in the temperatures at each location in the grid and predict temperatures at each location for the 10 days following the sequence as shown in the figure below.


A model that can accurately predict data for multiple days ahead in the future has greater flexibility and practicality than a model that only predicts a day ahead in the sequence. Forecasts of the weather are often done for multiple days in the future given a history of data. These longer term forecasts have greater use than short term forecasts, since more information can be inferred about the future. We will show how this can be achieved using a minimal amount of changes from our previous code.


Recall that the data consists of mean daily temperatures of the ocean from 1981 to 2017. We used temperatures taken from the Bengal, Mediterranean, Korean, Black, Bohai, Okhotsk, Arabian, and Japan seas. They are contained in CSV files which contain a sequence of 50 contiguous daily mean temperatures for a portion of a sea. Every CSV file contains 50 rows representing a sequence of 50 days and 52 columns representing 52 points in the 13-by-4 grid. For a more in depth overview of the data see the previous post.


Only slight changes were made to the processed data. We created sequences of 50 temperature grids from the raw data to use as features like before. Similarly, the target sequences are also 50 temperature grids taken to be one step ahead of the temperatures in the feature sequence. However, we also extracted 10 days worth of temperatures following each target sequence. These temperatures will be used to evaluate the network's forecasts further into the future than before. Because we do not want overlap in both location and time across different sequences of temperatures (combined with the 10 days worth of temperatures following each sequence), the number of sequences overall was reduced from 2037 to 1736. However, we are still using the same amount of data overall as before.

Sequences 1 to 1600 contain temperatures taken from the specified seas from 1981 to 2014. They will comprise the training set. The rest of the sequences contain tempreatures taken from 2015 to 2017 and comprise the testing set. Thus, we aim to train on the past and test on the future.

The feature and target sequences are contained in csv files, which are contained in separate directories. The sequences are named from 1.csv to 1736.csv. Sequences that share file names are linked. For example, the sequence named 1.csv in the target directory contain temperatures one step ahead of the temperatures in the sequence named 1.csv in the feature directory. This is exactly similar to how the data was organized before, excluding the fact that there are less sequences in total. Unlike before, we have a third directory (futures) containing the tempratures for the 10 days following a target sequence. Thus, the file named 1.csv in the futures directory contains the temperatures for the 10 days following the sequence in the file 1.csv in the target directory.

Convolutional LSTM Network

We again will apply a variant of a convolutional long short-term memory (LSTM) RNN to this problem. For a description of the network architecture and why this type of architecture is suited to our data see our previous post.

One difference from before is that a subsampling layer was added to the network used in this post. Thus, this neural network consists of a convolutional layer, subsampling layer, graves LSTM layer, and RNN output layer in succession. We found that the addition of this subsampling layer following the convolutional layer improved the accuracy of the forecasts. In general, subsampling layers reduce the number of features and computational complexity of the network. Thus, this layer is one way to prevent the neural network from overfitting the data.

More specifically, the network will accept two inputs at each time step: the grid of current temperatures (x in the figure) and a vector of network hidden states (h in the figure) from the previous time step. We process the grid with one or more convolutional filters and use a subsampling layer (s(x) in the figure) to reduce the number of features of the network using the max operation. We then flatten the output from the subsampling layer and pass it to a LSTM RNN layer along with the previous hidden states. The LSTM RNN layer updates its gate functions and its internal state (c’ in the figure). Finally, the LSTM emits an output (h’ in the figure), which is used both to predict temperatures at the next step and as an input at the next time step (h in the figure).



The code that extracts the sequences, performs vectorization, and builds and trains the neural network is available in a Zeppelin notebook using Scala. In the following sections, we will guide you through the code.

ETL and Vectorization

The ETL process is almost exactly similar as the ETL process described in the previous post. The only difference is that we initialize an additional DataSetIterator containing the 10 days worth of temperature grids following each target sequence.

To create these DataSetIterators, we use the same process as before: initializing first RecordReaders that parse raw data into a structured record-like format (elements indexed by a unique id) and then initializing DataSetIterators which take in the previously created RecordReaders as input.


For a more in-depth description of this process and the classes used for ETL and vectorization, see our previous post.

Designing the Neural Network

We configure our neural network model using the NeuralNetConfiguration class. Using the configuration class, we can specify hyperparameters such as the optimization algorithm, number of hidden layers, and a custom updater.

We use the configuration builder API to add three hidden layers and an output layer. Like before, the first is a 2-D convolutional layer whose filter size is determined by the variable kernelSize. The next layer is a subsampling layer with the max pooling operation. This layer iterates through the output of the convolutional layer using a 2 by 2 block size and extracts the maximum number in each of the blocks. This function reduces the number of features to feed to the next layer.


The next layer is a Graves LSTM RNN with 200 hidden units and using a softsign activation function. The final layer is an RNNOutputLayer with 52 outputs, one per temperature grid cell. For this regression task, we use the identity activation function for the output layer, since we are predicting a continuous value and the mean squared error loss function.

Like before we use a RnnToCnnPreProcessor for the first layer that reshapes each vector into a grid before applying the convolutional layer. Likewise, we use a CnnToRnnPreProcessor to flatten the output from the convolutional layer before passing it to the LSTM.

After building our neural network configuration, we initialize a neural network by passing the configuration to the MultiLayerNetwork constructor and then calling the init() method, as below.


Training the Neural Network

We will now train the neural network we configured. We iterate on the number of training epochs (complete passes through the training set) and use the fit method of the MultiLayerNetwork on the training data iterator. Note that the 10 days worth of temperature grids following the target sequences are not used in the training process. Only the feature and target sequences are used to train our neural network.


Evaluating the Neural Network

This section will deviate the most from our previous post. Thus, pay close attention to what is happening here.

Like before, we will train on the past and test on the future, which mimics how the model would be used in real world settings. This is inherent in our ETL and modeling process, since the training set contains data from years 1981 to 2014, while the testing set contains data from 2015 to 2017, as mentioned before.

To evaluate on the testing set we will first initialize a RegressionEvaluation object which will ultimately evaluate how well the neural network makes forecasts. We then use a simple while loop to iterate over the batches of the test data iterator. The first line within the loop simply extracts the DataSet object from the iterator. This DataSet object contains the features and labels for the sequences in a batch. We then extract the INDArray containing the features of the batch data and initialize an INDArray intended to contain predictions from the MultiLayerNetwork. The INDArray containing the features is 3 dimensional: the first dimension represents the number of examples in the batch, the second dimension represents the number of features (or points in the grid, in our case 52), and finally the third dimension represents the time steps of the sequence (50 in our case). In contrast, the INDArray intended to contain the predictions is simply two dimensional. The first dimension represents the number of examples in the batch and the second dimension represents the number of features. There is no third dimension representing the time steps, since the neural network will predict one day ahead at a time.


We then iterate over days of the sequence of which there are 50 in total. Within the loop we call the rnnTimeStep method of the MultiLayerNetwork using the INDArray containing the features of the batch (at time t). This method fulfills two purposes. The first is that the method outputs the prediction of the neural network (for time t+1) by performing a forward pass using the previous states of the neural network. The second is that the method updates the internal state of the neural network. Thus, when the rnnTimeStep is called for the next time step, the neural network will correctly make a prediction for time t+2.

Thus we want to do a full pass of the sequence using the rnnTimeStep method before making a real prediction (forecasts for the 10 days following each target sequence). This is used to update the internal state of the neural network so that its output will represent the temperatures following each target sequence. If this is not done, the full history of the sequence will not be utilized for the predictions and the network will likely not perform well.

Our next step is to make predictions for the 10 days following each sequence and evaluate the neural network. To start we extract a DataSet containing the 10 days worth of temperature grids following each sequence from the futures data iterator. Another INDArray containing the features from the futures iterator is initialized. Next we create a for loop, which iterates through each day we make a prediction for (10 days). Within the loop, we evaluate our predictions and the actual temperatures using the RegressionEvaluation object we initialized previously. We then use the rnnTimeStep method again to update the internal state and make a new prediction.

The final line of the while loop, which iterates through batches of the testing set, clears the internal state of all recurrent layers of the network. This must be done since we want to make predictions for a new batch at a future iteration of the loop. Thus, the stored state for the current batch should be cleared.

Once this process is finished we can print out the statistics from the RegressionEvaluation object. We see that our predictions are on average approximately 1 degree away from the actual temperatures, and the R^2 is in the high 0.9's!



We have shown how to use Eclipse DL4J to improve upon our previous work done in forecasting sea temperatures. Specifically we were able to provide accurate forecasts 10 days further into the future using a convolutional LSTM RNN. We observed that adding a subsampling layer with the max pooling operation improved the accuracy of the forecasts by reducing the number of features and preventing the neural network from overfitting the data.