Building a Production Grade Object Detection System with SKIL and YOLO

In this article, we take a state of the art object-detection neural network and put it in production as a fully realized maintainable object-detection system using the SKIL platform.


Building a computer vision system for production is hard, since there are a lot of factors to consider:

  • How do we build (or find) a network to do the predictions?
  • Where do we store the models in such a way that they can be updated or rolled back?
  • How do we serve model predictions as the demand from the client tier grows?

Beyond that, we need to consider what it is like in practice to work with complex results from an object-detection system.

This article will take you through a full lifecycle and leave you with a working application that you can modify for your own purposes. It will also leave you with an understanding of techniques such as:

  1. The native TensorFlow model import capabilities of SKIL
  2. Working with a live real-world computer vision object detection application

Now let’s dig into the basics of computer vision and object detection.

What is Object Detection?

Object detection in computer vision can be defined as finding objects in images with “0 to multiple objects per image”. Each object prediction is accompanied by a bounding box and a class probability distribution.

Here are three important papers for state-of-the-art object detection:

Older methods for similar tasks include Haar Cascades, but that was much slower compared to these new methods. We're going to focus on the YOLO v2 network below.

With a YOLO network, we apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region.


These bounding boxes are weighted by the predicted probabilities, where each object is marked by a bounding box with four variables: center of the object( bx , by ), rectangle height ( bh ), rectangle width( bw ).

We could train a YOLO network from scratch, but that would take a lot of work (and costly GPU hours). As engineers and data scientists, we want to leverage as many pre-built libraries and machine learning models as we can, so we’re going to use a pre-trained YOLO model to get our application into production faster and cheaper.

Using Pre-Trained Models with SKIL's Model Server

In a previous article on Oreilly’s blog, we talked about how:

“Integrating neural networks and convolutional neural networks into a production-ready enterprise application can be a challenge in itself, separate from the modelling task.”

The SKIL platform was designed to address many of the issues described there. In this article, we’ll take a look at leveraging SKIL to import externally-created native TensorFlow-formatted models and serve predictions with them from the SKIL Model Server.


Here, we’ll use the original authors' version of YOLOv2 trained on the COCO dataset. The version of the YOLO model setup we use in this example is based on the YOLOv2 architecture that was trained on the COCO dataset. It can recognize 80 distinct classes.

The weights are taken from the links below and are listed under YOLOv2 608x608.

We’ve taken this model and converted it to the TensorFlow format (protobuff, .pb) to get it ready to import into SKIL for inference serving. To make this tutorial simpler, we’ve hosted the converted model for the users to download.

Serving Real-Time Object Detection Predictions

Machine learning practitioners often focus on the modeling aspect of machine learning without giving much consideration to the complete life-cycle involved in putting models into production. At the most general level, we need to consider the difference between machine learning modeling and model inference; i.e. serving the predictions after model training.


“Only a small fraction of real-world ML systems is composed of the ML code…”

SKIL lets teams separate workflow phases like modeling from serving inferences. SKIL also allows the ops team to focus on managing scale-out model inference serving while the data science team focuses on improving the model through further training. On the inference side, we have 3 major ways we can serve inferences:

  1. Classical OLTP-style single transaction inference requests across the network (slow but flexible).
  2. Large scale batch inference requests (OLAP-style; e.g. make inferences with the saved model across 1 million records in HDFS with Spark, one inference per record).
  3. A client requesting the latest copy of a model to cache locally and skip the network traversal for many inferences on the local copy.

In this tutorial, we’ll focus on the most basic type of inference, where we make a REST-based inference request across the network to get the predictions sent back across the network to a remote client application.

Loading a YOLO TensorFlow Model into the SKIL Model Server

This section assumes you already have SKIL set up. If you don't, please take a look at our quickstart.)
Now we can log into SKIL and import the TensorFlow protobuff (.pb) file mentioned above.

  1. Log into SKIL
  2. Select the "deployments" option on the left side toolbar
  3. Click on the "New Deployment" button
  4. In the models section of the newly created deployment screen, select "Import" and locate the .pb file we created
  5. For the placeholders options:
    • Names of the Input Placeholders: "input" (make sure to press 'enter' after you enter the name)
    • Names of the Output Placeholders: "output" (also make sure to press 'enter' after you enter the name)
  6. Click on "Import Model"
  7. Click the "start" button on the endpoint

It will take a few seconds for the page to report that the endpoint has started successfully. Once the page lists the endpoint as running, you will have access to the model from the endpoint listed on the page. The endpoint URI will look something like:


Now we need a client application to query this endpoint and get object-detection predictions.

Building an Object Detection Client Application

To simulate a real-world use case, we’ve included a sample client application that does more than make a REST call to the SKIL model server. We show some of the key parts of the SKIL client code in the code section below.

NativeImageLoader imageLoader = new NativeImageLoader(608, 608, 3, new ColorConversionTransform(COLOR_BGR2RGB));
        INDArray imgNDArrayTmp = imageLoader.asMatrix( imgMat );
        INDArray inputFeatures = imgNDArrayTmp.permute(0, 2, 3, 1).muli(1.0 / 255.0).dup('c');

        String imgBase64 = Nd4jBase64.base64String( inputFeatures );
        Authorization auth = new Authorization();
        long start = System.nanoTime();
        String auth_token = auth.getAuthToken( "admin", "admin" );
        long end = System.nanoTime();
        System.out.println("Getting the auth token took: " + (end - start) / 1000000 + " ms");

        System.out.println( "Sending the Classification Payload..." );
        start = System.nanoTime();
        try {

            JSONObject returnJSONObject = 
           skilInferenceEndpoint + "predict" )
                            .header("accept", "application/json")
                            .header("Content-Type", "application/json")
                            .header( "Authorization", "Bearer " + auth_token)
                            .body(new JSONObject() //Using this because the field functions couldn't get translated to an acceptable json
                                    .put( "id", "some_id" )
                                    .put("prediction", new JSONObject().put("array", imgBase64))
                            .getBody().getObject(); //.toString(); 

            try {


            } catch (org.json.JSONException je) { 

                System.out.println( "\n\nException\n\nReturn: " + returnJSONObject );


            end = System.nanoTime();
            System.out.println("SKIL inference REST round trip took: " + (end - start) / 1000000 + " ms");

            String predict_return_array = returnJSONObject.getJSONObject("prediction").getString("array");
            System.out.println( "REST payload return length: " + predict_return_array.length() );
            INDArray networkOutput = Nd4jBase64.fromBase64( predict_return_array );

The SKIL client code is for this example will perform the following tasks:

  1. Authenticate with SKIL and get a token
  2. Base64 encode the image we want predictions for
  3. Take the auth token and the base64 image bytes and send them via REST to SKIL for inference
  4. Base64 decode the results that returned from the SKIL model server
  5. Apply post-inference activation functions (via the YoloUtils class) needed for TensorFlow models (specifically)
  6. Render the output bounding boxes on the original image, as seen below


For normal DL4J and Keras-formatted models hosted in the SKIL model server, we don't have to apply post-inference activation functions. However, TensorFlow networks do not apply activation functions automatically to the final layer. To complete this example, we have to apply these activation functions in the client code with the YoloUtils class methods provided.

Clone this repo with the command below to get the included YOLOv2 sample application that will retrieve predictions and render the bounding boxes locally:

git clone

We then need to specifically build the YOLOv2 client application JAR file:

cd skil_yolo2_app/client_app
mvn -U package

This will build a JAR file named skil-example-yolo2-tf-1.0.0.jar in the ./target subdirectory of the client_app/ subdirectory.

Now that we have a client application JAR, we can run the yolo2 client JAR from the command line:

java -jar ./target/skil-example-yolo2-tf-1.0.0.jar --input [image URI] --endpoint [SKIL Endpoint URI]


  • --input can be any input image you choose (local file with the file:// prefix, or an image file via an internet URI with a http:// prefix)
  • --endpoint parameter is the endpoint you create when you import the TF .pb file

An example of this command in usage would be:

java -jar ./target/skil-example-yolo2-tf-1.0.0.jar --input --endpoint http://localhost:9008/endpoints/tf2/model/yolo/default/

This repo will build a JAR named "skil-example-yolo2-tf-1.0.0.jar" so that we can run the yolo2 client JAR from the command line:

java -jar ./target/skil-example-yolo2-tf-1.0.0.jar --input [image URI] --endpoint [SKIL Endpoint URI]

This client application will allow us to get predictions for any image and render the bounding boxes + classifications on the image as well as seen in the image above.

Summary and Future Ideas

The YOLO demo is a lot of fun to play with and send your own images through to see what it can pick out. If you want to reference a file that is on your local filesystem, just replace http:// with file:// in the URI as in the example below:

java -jar ./target/skil-example-yolo2-tf-1.0.0.jar --input file:///tmp/beach.png --endpoint [SKIL Endpoint URI]

You’ll see that YOLO is quite adept at picking out even subtle objects, as we can see in the complex street scene below.


To read more about how YOLO works and what else you can build with it on SKIL, check out the following resources:



Skymind is the company behind Deeplearning4j, the only commercial-grade, open-source, distributed deep-learning library written for the JVM.

Read More