/ SKIL

Combining RPA and AI for Anomaly Detection

Robotic process automation (RPA) is a technology that allows computers to perform tasks within business processes, tasks that have traditionally been carried out by people.

Examples of tasks that RPA can do include data entry and transfer; extracting structured data from PDFs, excel files, or other scanned documents; and generating and updating user information, among others.

Implementing RPA processes sometimes requires a large initial cost, but in many situations, the benefits of successfully implementing RPA can offset these initial costs and save a lot of money in the medium to long term. Some benefits include speeding up repetitive and time-consuming tasks and freeing up human labor for tasks that require more sophisticated skills.

Traditionally, RPA processes have relied on rules to make decisions within different frameworks. These rules can be thought of as if-then statements that tell the computer what to do under different conditions: e.g. "if a document contains the keyword "invoice", send it to the bill-processing department". The disadvantage of a rule-based program is that it limits the scope of RPA, since many situations require complex decision-making that cannot be captured solely through rules. But incorporating AI into RPA can expand the scope of RPA processes to areas where traditional methods have failed.

In this post, we will merge AI and RPA for an anomaly detection application in finance. We will see how easy it is to marry AI and RPA using UIPath Studio and the Skymind Intelligence Layer (SKIL). (For readers who need to know how to log into and deploy models in SKIL, look here and here first.) Because we are mainly interested in merging AI and RPA, we will only briefly cover the model building and training, in order to focus on the former task.

Anomaly Detection Task

Our goal is to detect fraud in financial transactions. In order to surface anomalous transactions, we will first build and train a feed-forward neural network. The Zeppelin notebook that contains the code for preprocessing the data and training the model is available here, and the data used for this blog post is publically available here.

Once the model is trained, we will deploy it using SKIL. The deployed model can then be accessed within RPA processes using UIPath Studio.

Overview of Data

The data was generated using PaySim, a mobile money transaction simulator. The simulated transactions are based off real transactions provided by a multinational company. Each training instance in the dataset is a single transaction. The data contains features, such as the type of transaction (payment, transfer, cashout), the amount of the transaction in the local currency, the balance of the original account before and after the transaction, the balance of the recipient account before and after the transaction, and an indicator whether there was an attempt to transfer an amount of 200,000 or more between accounts. Additionally, there exists a label for each transaction, which indicates whether the transaction is fraudulent or not.

Overview of Model

As mentioned, the neural network is a typical feed forward neural network. The model will ingest all the features of the transactions, pass the features through a number of hidden layers, and finally output the probability that a certain transaction is a fraudulent or not. This is a binary classifier for fraud detection.

Code

We will go over only the pertinent pieces of the code in the Zeppelin notebook. One perk about using Zeppelin is that you can mix different programming languages within the same notebook. For this use case, we will build a deep neural network using the Python library Keras and do the data preprocessing with Skymind's DataVec library in Scala. All that needs to be done to specify the programming language for a cell is to put %pyspark or %spark as the first line to use Python or Scala, respectively. We will then use the Python library for skil to add the fully trained model to the SKIL experiment. Again, read this post and this post if you are new to using SKIL, to learn how to run Zeppelin notebooks and deploy machine learning models.

In the following subsections we will present the code for creating a model and transform that we'll deploy. Both the transform and the model are required because the raw data within the RPA process needs to be processed into a form that is suitable for a neural network to read before querying the model. The transform will be used to process the raw data within the RPA process; its output will be used to query the model.

Model

Here we will present the Python code to train a classifier to predict whether or not a certain transaction is a fraudulent one. The Python library used to train the feed forward neural network is Keras. We will train the classifier on the training set of transactions and then import the trained model into SKIL using the Python library skil.

Below is the architecture of the feed-forward network we will train. There are two dense layers with the Relu activation function. The first layer takes takes in an input of dimension 7. The dimensions include the type of transaction, the amount of the transaction, the balance of the original account before and after the transaction, the balance of the recipient account before and after the transaction, and an indicator whether there was an attempt to transfer an amount of 200,000 or more between accounts. These features are normalized before being passed to the model. The model ends with an output layer with a sigmoid activation function. The sigmoid activation function computes a probability: whether or not a certain transaction is fraudulent. In the learning process, a stochastic gradient descent optimizer is used with a binary cross entropy loss function.

model = Sequential()
model.add(Dense(50, activation = 'relu', input_shape = (7,)))
model.add(Dropout(0.2))
model.add(Dense(50, acivation= 'relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation= 'sigmoid'))
sgd = SGD(lr=0.00001, decay = 1e-6, momentum=0.9, nesterov= True)
model.compile(optimizer=sgd, loss = 'binary_crossentropy', metrics = ['accuracy'])

To fit the model, we simply call the fit function on the processed data and labels. Note that there exist heavy so-called class imbalances in the dataset, because there are a small number of fradulent transactions compared to nonfraudulent ones. To compensate for the imbalance, we can pass class weights to the model. The class_weight dictionary weights transactions with a label of 1 (fraudulent transactions) much higher than nonfraudulent transactions. The weight corresponds to the number of fraudulent transactions compared to the size of the full dataset. We can then pass this class_weight dictionary as an input to the fit function of the model. The fit function will weigh fraudulent transactions higher during the training process for us.

class_weight = {0: 1., 1: len(df.index)/sum(df['isFraud'])}
model.fit(X_train, y_train, epochs=5, batch_size=16, validation_split=0.1, verbose = 2, class_weight = class_weight)

Once the model is fully trained, we must add it to the SKIL experiment before deployment. Simply import the skil package, initialize a SkilContext, and call the addModelToExperiment function on the trained model. The 'z' parameter of the addModelToExperiment function in the SkilContext class represents the Zeppelin context. This input is typically needed to interact with the SKIL machine and should be used in functions that add models to experiments, add evaluation to models, etc. Note that 'z' does not need to be initialized. It comes shipped with the Zeppelin notebook in SKIL.

skilContext = skil.SkilContext(sc) 
model_id = skilContext.addModelToExperiment(z, autoencoder)

Transform

Before deploying and querying the trained model, we must create a TransformProcess using the DataVec library. This step is needed to transform raw data before querying the deployed model within the RPA process.

The raw data is in the form of a CSV file, which includes all the features we will use for the model, as well as a time variable called step and the name of the original and recipient accounts. Thus, we will initialize a Schema in order to define the format of the data before passing it to a Transform.

val schema = new Schema.Builder()
    .addColumnsDouble("step")
    .addColumnsString("type")
    .addColumnsDouble("amount")
    .addColumnsString("nameOrig")
    .addColumnsDouble("oldbalanceOrig", "newbalanceOrig")
    .addColumnsString("nameDest")
    .addColumnsDouble("oldbalanceDest", "newbalanceDest")
    .addColumnsInteger("isFraud", "isFlaggedFraud")
    .build()

We can then define a TransformProcess to specify which transform we want to apply to the data. In this case, we remove the label (isFraud) and the time and name variables. We also change the data type to be a categorical variable instead of a string variable. Finally, we scale the variables to be in the range 0 to 1. We then output the transform process as a JSON file and save it to a temporary directory in the SKIL machine. We will use this JSON file to add the transform to a SKIL deployment.

val tp_final = new TransformProcess.Builder(schema)
    .removeCOlumns("step", "nameOrig", "nameDest", "isFraud")
    .stringToCategorical("type", types)
    .categoricalToInteger("type")
    .normalize("type", normalize.Normalize.MinMax,  dataAnalysis)
    .normalize("amount", normalize.NormalizeMinMax, dataAnalysis)
    .normalize("oldbalanceOrig", normalize.NormalizeMinMax, dataAnalysis)
    .normalize("oldbalanceDest", normalize.NormalizeMinMax, dataAnalysis)
    .normalize("isFlaggedFraud", normalize.NormalizeMinMax, dataAnalysis)
    .normalize("newbalanceOrig", normalize.NormalizeMinMax, dataAnalysis)
    .normalize("newbalanceDest", normalize.NormalizeMinMax, dataAnalysis)

Model Deployment

Deploying the model is easy once the training process is complete. We will assume a SKIL deployment is up and running. First open the experiment used to train the model.

experiment

Then click on the models tab, then click on deploy, and follow the deployment wizard to add the model to a deployment.

click_deploy

Next, click on the Deployments tab on the left of the SKIL interface, select a deployment, and click on the start button to deploy a model.

deploy

We have successfully deployed the model. We can now access the deployed model within RPA processes using the UIPath framework.

Transform Deployment

Deploying our saved TransformProcess is as easy as deploying the model. First, click on the Deployments Tab on the SKIL interface and then select Imports on the Transforms section.

ImportTransform

Once the Import Transform page comes up, name the Transform and write in the path to where the JSON file was saved. Then click "Import Transform" and start the Deployment.

Querying Deployed Model using UIPath Studio

In this section, we will assume basic knowledge about using UiPath Studio. If you are not familiar with using UiPath Studio for creating RPA processes, there are myriad tutorials online. We will also assume that the Skymind activities for UiPath Studio have been downloaded. If they are not downloaded for your machine, look here.

The goal is to set up a workflow that automatically reads in the raw data in as a CSV file, passes the raw data through the previously deployed transform, and sends its output to the deployed model to obtain the probability that the transaction is fraudulent. The final output to your machine will be a message box saying whether or not the transaction is fraudulent.

Once UIPath Studio is set up and the Skymind activity has been downloaded, start a new Sequence within UiPath Studio. The first step in the Sequence will be to read in the data. We will use the Read CSV activity that outputs the data as a DataTable. Its properties are shown below. Here we assume that there is a CSV file named "transaction.csv" containing data about a single transaction.

read_csv

Next, we will iterate through the rows of the DataTable using the For Each Row activity. Note that in this particular case, the DataTable only has 1 row.

Now drag the SKIL Deployment Scope into the body of the For Each Row loop. This activity will let us authenticate to the SKIL machine as well as query the deployed transform and model. The UiPath Sequence should now look something like this:

progress1

We must fill in the properties of the SKIL Deployment Scope. Enter the deployment name, the username and password in order to authenticate to your SKIL machine, and the SKILServerUrl. An example is shown below.

filled_properties

Inside of the SKIL Deployment Scope, you’ll find an embedded sequence activity with the name Do. Within this Do activity, drag in a Transform Data activity and a Classify Data activity from the Skymind package. These two activities will look at the parent SKIL Deployment Scope for the authenticated connection to the SKIL server.

scope

Enter in the deployed Transform name, the name of the output variable of the Transform in the Transform data scope, and the row variable in the For Each Row iterator as the input for the Transform Data step. Next, for the Classify Data step, enter in the model name and the variables that will hold the confidence and prediction values output from the deployed model. Be sure to enter in the name of the output variable of the Transform into the Input field of the Classify data scope. This will allow the model to ingest the output from the Transform.

After that, the properties of the Transform data and the Classify Data activities should look something like this:

properties

Note that the variables need to be defined for the Sequence to run. Click on the variables tab on the bottom and create all the variables needed. To find the type of the variable that is needed, click the button to the immediate right of the input text function in the transform and model scopes.

Types

As shown above, the variable type that is required is the TransformResponse. You can browse for variable types when creating a variable, so be sure to look for the correct type.

At this point, you should have a Sequence that reads in a CSV file, transforms each row of the CSV file to a form a model can read in, and passes the output to the deployed model. The prediction variable will contain the value of the prediction (0 for a non-fraud prediction, and 1 for fraud).

Suppose we want to display whether the model has found a fraudulent transaction or not. We will need to use an if/else condition and a Message Box activity.

if

As shown above, we will display a message indicating whether the transaction is fraudulent or not depending on the prediction of the model.

Now we can run our UiPath sequence and see that for this transaction, the model shows the prediction is not fraudulent.

output

Recap

To recap, we've shown how to integrate AI into an RPA process for anomaly detection using SKIL and UiPath Studio from start to finish. We started with preprocessing the data using the DataVec library and training a neural network using Keras to detect anomalies within a Zeppelin notebook. We then deployed a model and the data transform process in SKIL and showed how to query the model and transform within a UiPath Sequence using Skymind activities.

This sums up our first post on integrating AI tools into UiPath processes. Look out for more posts in the future!