Predicting Instacart Users' Purchasing Behavior using Multitask Learning

Predicting interests of website users is a challenge in online marketing and advertising. As predictions become more accurate, merchants can better personalize content for their users, which can lead to increased sales. One way to predict future interest is by taking advantage of a user's past search or purchasing history. For example, a user who frequently purchases hot cocoa during winter seasons is likely to purchase it again during future winters. This user may also be more likely to purchase milk, mugs, marshmallows, or cookies as well.

Today we will use deeplearning4j (dl4j) to build a model that predicts the behavior of instacart users. For those unaware, instacart is a grocery ordering and delivery app which makes shopping convenient for its users. To use the app users simply select items through the interface and personal shoppers will review the order, do the in-store shopping, and deliver the products right to the users.

Using the information available on previous orders of instacart users, our model will predict an aspect of the future purchasing behavior of its users. We will also experiment with multitask learning (simulataneously training a model with two related outputs) using a separate auxiliary target. We will then compare the results between the original model and the multitask model to see whether multitasking leads to an improvement in the results on the main task.

Data

The data we will use was taken from a Kaggle challenge on predicting users' instacart purchases. We first merged the given csv files to obtain a combined dataset including user ids, order numbers, and product information. The product information contains the product name, the aisle the product is in, and the department the product falls under. Overall the products are organized into 134 aisles which comprise 21 departments.

In the figure below, we can see how the aisles and departments are organized. Note that the figure was created by Miyabi Ishihara, a PhD student in statistics at UC Berkeley. The size of each block correspond to how frequently users purchased products from certain aisles and departments. Thus, the figure shows that products in the produce department were most frequently purchased.

products

We removed users who only made one order through the instacart app and took a subset of 5000 users from the remaining users to comprise the data for this blog post. For each of the 5000 users, we have a sequence of features that represent information about each of their orders.

Task

The task we will focus on is to build a model that will predict what kinds of products instacart users will buy given their past history of orders. These kinds of models are useful for predicting purchases that the instacart users have not yet made. Knowing what kinds of products certain users are likely to buy is helpful for product placement on the interface, advertising, and etc.

To create the features for this task, we will use indicators representing whether a user purchased a product from the given aiseles. Since there are 134 aisles, we have 134 features for each order. The last order was removed from the features, since we will use the last order as part of the predictions. The targets for this task is a sequence of indicators that represent whether or not a user bought an item from the breakfast department. Thus, the targets are a sequence of 0 or 1 values, making this a type of binary classification task. The auxiliary targets used for multitask learning will be indicators that represent whether or not a user bought an item from the dairy department. The preprocessed data is available here.

LSTM

Because of the sequential nature of the data, we will be applying long short term memory (LSTM) recurrent neural networks (RNNs) for this prediction task. We anticipate a LSTM will perform well, because of the temporal dependencies that exist in the data. For example, if a user has previously bought a product in a certain aisle, such as a yogurt product, they might be more likely to purchase a breakfast product in the future.

The auxiliary task we used was predicting whether or not a user will purchase a product in the dairy department. We chose this auxiliary task because we suspect that the two targets related. If a user buys a product in the breakfast department, we suspect they are more likely to purchase a product in the dairy department, such as eggs, milk, cheese, or yogurt.

Code

The code that extracts the sequences, performs vectorization, and builds and trains the single task LSTM network is available in this Zeppelin notebook. We also have a separate Zeppelin notebook that uses a multitask LSTM network. In the following sections, we will guide you through the code. Because the code for training the single task LSTM network is similar to material done in previous blog posts, we will only focus on the code for training the multitask LSTM.

ETL and Vectorization

The first task is to process our data into a format a neural network can read (NDArrays). We will use tools from the open source Eclipse DataVec suite.

Our data is formatted into CSV files and contained in three directories: features, breakfast, and dairy. Each CSV file in the features directory contains a history of orders for a single user. Each row in the file represents a single order and each column represents indicators for whether a user bought an item in a certain aisle. Each file in the breakfast and dairy directories contains a sequence of indicators representing whether or not a user bought a breakfast or dairy item in the next order. Thus, row 1 of a csv file in the features directory represents the first order of that user and row 1 of the corresponding files in the dairy and breakfast directory pertains to the second order.

To process these CSV files, we begin with a RecordReader which are used to parse raw data into a structured record-like format (elements indexed by a unique id). Because our records are in fact sequences stored in CSV format (one sequence per file), we use the CSVSequenceRecordReader.

etl

Because DL4J neural networks accept DataSets, rather than records, we must initialize a DateSetIterator. To convert records into DataSets, we use a RecordReaderMultiDataSetIterator. A RecordReaderMultiDataSetIterator is different from a more typical RecordReaderDataSetIterator, since the former can read in more than two RecordReaders that represent multiple inputs or outputs. In our case, we have one input, the features, and two outputs for multitask learning - the original task (indicators whether a user will purchase a breakfast item) and the secondary task (indicators whether a user will purchase a dairy item). We only show how to do this for the training set, since the code for the testing set is similar. Lastly, we pass the AlignmentMode.ALIGN_END flag, since the time series lengths between sequences can be different (users can have different number of previous orders). We are now finished with the ETL process.

Designing the Neural Network

As always we will configure our model using the NeuralNetConfiguration class which can be used to specify the hyperparameters of the model.

We use the configuration builder API to add one GravesLSTM layer and two output layers. Since the MuliLayerNetwork does not support having multiple outputs, we will use a configuration for the ComputationGraph class using the graphBuilder method.

The GravesLSTM layer will take in a feature vector of length 134 at each time step and will have 150 hidden units. We will specify the tanh activation function, the ADAM updater to update the parameters of this layer, and gradient normalization for this layer. The output layers will take in 150 units as input from the GravesLSTM layer and output 1-dimensional values using the sigmoid activation function. The output layers will also use the ADAM updater and gradient normalization to update their parameters.

config-2

After setting the neural network configuration, we must not forget to initialize a neural network by passing the configuration to the ComputationGraph constructor and then calling the init() method, as below.

init-2

Training the Neural Network

We will now train the multitask LSTM we configured. Like before, we iterate on the number of training epochs (complete passes through the training set) and use the fit method of the ComputationGraph directly on the training data iterator.

train-1

Evaluating the Neural network

After model training, we will evaluate the model on a held out test set of 1000 users using the area under the curve (AUC) metric based on the ROC curve. We simply loop over the test set and first extract the features, labels, and output of the model. We then use the evalTimeSeries method of the ROC class only on the first target (breakfast department).

eval1

Once this process is finished we can print out the final AUC. We see the model achieves an AUC of 0.75!

We compare this result to the result from training the single task model in this Zeppelin notebook. We saw that the single task model achieves an AUC of 0.64. Thus multitasking has lead to an improvement on the results on our original task.

Conclusion

We have shown how to use DL4J to build a LSTM model that predicts an aspect of instacart users' purchasing behavior, specifically whether or not a user will buy something from the breakfast department in the next order. Simultaneously training our LSTM network on an auxiliary target representing whether or not a user will buy something from the dairy department improved the results of our model on the original task. Due to the flexibility of the ComputationGraph class, we observed that applying multitask training was straightforward using DL4J.