Detecting the Presence of Clouds using Satellite Image Data

For this post, we will apply deep learning to the task of detecting the presence of clouds in polar regions from satellite image data. This is particularly challenging because the ice and snow covering the ground surfaces in these regions can look visually very similar to clouds. Nevertheless, detecting clouds is an important task because clouds obscure images and make it hard to obtain information about the Earth's surface from remote sensors on satellites. If not properly accounted for, clouds can adversely affect many applications, such as recognizing objects on the ground and detecting changes to the earth's surface over time.

The data we will use originates from NASA's MISR satellite which was launched in 1999. There are 9 optical sensors on the satellite which take images of the Earth from 9 different directions. The locations the images are taken are from the Arctic regions. Overall, we will use data taken from 3 images to train a model for cloud detection. The network will ultimately output a binary label, which represents whether a cloud is present at a specific location.

Whether clouds are present in a particular location correlates with whether its neighboring locations contain clouds. Thus, we will also use the features of the 4 nearest locations as input to the model. Because the model takes in multiple feature sets, we will use the ComptuationGraph class, which is more flexible than the MultiLayerNetwork class. More details on how the model will be configured will be explained later in the post.


The data is available in a tar.gz file located here. The file contains two directories: train and test. The train directory contains data from 2 images which will be used for the training set. Likewise, the test directory contains data from 1 image which will be used for the test set. Within each of the previously mentioned directories, there exist 5 subdirectories named n1, n2, n3, n4, n5. n1 contains the label and features as a csv file for the location in consideration. n2-n5 contain the features for each of the nearest 4 locations of the original location.

Each csv file contains radiances from 5 different directions which can be used as features for the model. There are also 3 features developed from domain expertise for each location in a region. Thus, overall there are 40 features that will be used in our model (features for the original location + features for the 4 nearest locations). The labels used for this task are expert labels. Some locations in the regions did not have expert labels, but these locations were removed from the data. An example image is shown below.



Our task is to read in the features for all the locations in the 2 images in the training set and predict labels for the locations in the last image in the test set. Using the expert labels for the test image, we will evaluate how well our model performs. It is important to separate the images into different sets, because we want the model to generalize to new images. This is how models are used in real applications (train on existing labeled data and predict on new data which may or may not have labels).


The ComputationGraph class of DL4J will be used to create the feed forward neural network for this task. Because the data is separated into different feature sets (features divided by location), the model will take in multiple input sets. The MultiLayerNetwork class does not provide the functionality have multiple inputs (and multiple outputs). However, the ComputationGraph class is more flexible than MultiLayerNetwork and can be used for this purpose instead.


The code that reads in the data, performs vectorization, and builds the ComputationGraph is available in this Zeppelin notebook. In the following subsections, we will explain the important concepts of the code that have not come up in previous blog posts.

ETL and Vectorization

As always, we first need to convert the raw data into a format a neural network can read (NDArrays). We will only focus on reading in and vectorizing the data in the training set. The process for the testing set is analogous to what will be shown for the training set.

Recall, the data is contained in files called "train.csv" in subdirectories n1, n2, n3, n4, and n5. Because our data is contained in separate csv files, we will use CSVRecordReaders to first parse the raw data into record-like format. We will need 5 CSVRecordReaders since there are 5 separate csv file as shown below.


Next we need to build a DataSetIterator that can directly feed the data as NDArrays into a MultiLayerNetwork or ComputationGraph. For this application, the RecordReaderMultiDataSetIterator class will be used, since the data is separated into multiple RecordReaders. It can be initialized with a RecordReaderMultiDataSetIterator.

The RecordReaders can be added using the addRecordReader method of the builder class. We can also set names for each RecordReader of the DataSetIterator. Recall each row of the csv file contains information about a particular location in an image. The first column of the csv file in subdirectory n1 corresponds to the expert label. We can initialize this output using the addOutputOneHot method of the builder class by specifying which RecordReader the output is contained in and the column number. The next 3 columns contains features developed from domain expertise. Csv files in subdirectories n2 - n5 only contain features but for the neighboring locations of the original location in n1. These features can be added using the addInput class.

Once the DataSetIterator is initialized, we are ready to configure and build a neural network.

Designing the Neural Network

As always we will configure our model using the NeuralNetConfiguration class. We can first specify the optimization algorithm (almost always stochastic gradient descent) and the updater using the NeuralNetConfiguration builder class. Because we are configuring a ComputationGraph, we then call the graphBuilder method before specifying the architecture of the network.


As shown above, we then set the inputs to the network. We have 5 sets of inputs (a feature set for a particular location and 4 additional feature sets for the closest neighbors) that we can specify by calling the addLayer method 5 times. To join these layers, we include a MergeVertex layer which will merge the output of the previous 5 layers into a combined representation. The combined vector will then be passed to a dense layer and finally an output layer.

One thing to note is that we use a softmax layer for the output. Although the output is 1 dimensional (0 for no cloud and 1 for cloud), the data iterator we specified used a one-hot representation for the output. This means that 0 is now represented as (1,0) and 1 is now represented as (0,1). Thus, a sigmoid layer is no longer appropriate as an output layer. For all practical purposes, the two models should perform equivalently.

Once the configuration is finished, we are ready to initialize a ComputationGraph and start the training process.

Training the Neural Network

We will now train the feed forward network we configured. We use a for loop iterating on the number of training epochs (complete passes through the training set) and use the fit method of the ComputationGraph directly on the training data iterator. The fit method will handle all the neural network training for us.


Evaluating the Neural Network

After model training, we will evaluate the model on the held out test image using the area under the curve (AUC) metric based on the ROC curve. We simply call the evaluateROC method of the ComputationGraph on the test iterator.


After calling the calculateAUC method of the ROC class, we can see that the final AUC is 0.97!


We have shown how to apply deep learning to a cloud detection application using satellite image data. New concepts introduced in the post include using multiple RecordReaders to parse our data and a MultiDataSetIterator to handle having multiple inputs to our network. This same process can also be used if we have multiple outputs to the network as well like in this previous post. Because of the atypical neural network structure used (multiple inputs), we also had to use the ComputationGraph class of DL4J.

Although it is more work to include closest neighbor information in the neural network, doing so results in very good results (high AUC). This AUC score was higher than the corresponding score for a neural network that excluded nearest neighbor data (this neural network however is not presented in this post). Thus, we see that taking advantage of the fact that correlations exist between neighboring locations helps training a model for the task.