If you haven’t read through the first three quadrants of Machine Learning (ML) Lifecycle series yet, we encourage you to take a moment to familiarize yourself with the building blocks of AI/ML project design, data preparation, and model fitting.
Once the team reaches the Inference and Deployment phase in the ML Lifecycle, they should be prepared to leverage the model to make viable predictions, post-process the results, and visualize them. That means deploying the model to a production environment in order to predict outcomes in the real world. Assuming the model is generating sufficiently accurate results, all that is left is for it to be used for its intended purpose.
Prior to implementation, the model should show evidence of the ability to meet a set accuracy threshold on data that the model has never seen before, but has a similar distribution to the training data. In addition, stakeholders should be in agreement on what levels of inaccuracy are acceptable, as not all behaviors and outcomes are adequately predictable.
This is a joint effort that involves analysts, data scientists, and software engineers. For example, software engineers might develop visualization techniques to deliver the output in a variety of formats. This could be anything from a custom application to a Shapefile containing geographic information. In some cases, depending on the task, specialized visualization techniques also may be required—such as post-process triangulation algorithms that calculate a more precise location for a predicted object. Often, the model output doesn’t translate directly to the problem it is intended to solve, requiring an additional explanatory step. In those cases, we develop a pipeline for transforming and aggregating or further processing model outputs into something more applicable to end users, such as a crude User Interface (UI).
Challenges in the inference and deployment phase of the Machine Learning Lifecycle
Presuming that the model is sufficient and agreement exists on the part of the stakeholders, this fourth and final phase in the ML Lifecycle is set for success. It remains complex and tedious, however, because forward operational deployment (structuring the computing environment for successful ML in the real world) requires configuring many complicated dependencies and libraries. That challenge is made even more extreme due to the restrictive nature of many government networks, which do not allow for the installation of the software required to perform machine learning tasks. The effectiveness of the ML system depends on the quality of the data and the symbiosis of the systems so onsite data integration approaches must be capable of finding and highlighting useful data from the greatest possible variety of sources.
Moving models seamlessly from one computing environment to another is complicated. It’s not unusual for a data science team to spend days working to get CUDA, cuDNN, and the ML Frameworks (TF, PyTorch, etc.) to play together nicely. The NT Concepts team combats this complexity by installing and employing a few common tools on the client machine that can run a container containing all of the software needed to perform ML and data science.
Containerization is vital for deploying models and applications. We employ modern, sophisticated, and environment-friendly containerization orchestration platforms. This approach allows models and applications to run quickly and reliably from one computing environment to another: on-premise to a cloud environment, and customer networks to high-side networks.
Once prepared, the container can be deployed anywhere that is running Docker (or if necessary, NVIDIA Docker). Leveraging Docker containers is tremendously helpful in the development environment, provided you are able to produce a Docker image loaded with all of the correct dependencies.
Correctly installing the dependencies in a Docker container can take days or weeks. Recently, NVIDIA and Google released Docker images pre-loaded with all the libraries and dependencies required for machine learning. At a minimum, Docker must be installed on the host machine. Some problems require a great deal of processing power only capable with graphical processing units or GPUs. Our team prefers to leverage GPU power, as it tends to accelerate the process of training of ML models. We use NVIDIA Docker to allow containers to see and access host GPUs.
We prefer the Docker set up because it can run a container with any code in it, fully leveraging the power of the server with no additional dependencies needed. Commonly, a container will include five elements:
- A CUDA Toolkit to allow communication with GPUs
- NVIDIA cuDNN, a GPU-accelerated library of primitives for deep neural networks that provides highly tuned implementations for standard routines
- A programming language, commonly Python
- ML Frameworks including Tensorflow and PyTorch
- System and Language-Specific Dependencies
Additionally, the containers enable migration to a container orchestration platform like Kubernetes (k8s), an open-source system that packages code and code dependencies, and then automates the deployment, scaling, and management of containerized applications.
Using this approach to the Inference and Deployment Phase of the ML Lifecycle, NT Concepts has successfully containerized object detection models. In one instance, we containerized a weapons detection model. We also have loaded these models into Veritone’s aiWARE platform where they operate on still images, video, and live streaming video. This model has been used successfully to develop a physical security concept that locks doors and deploys countermeasures to detected threats.
Provided that the ML Lifecycle has been constructed properly, with hidden challenges addressed in each quadrant, the data science team now has the models in place to engage AI/ML to solve complex challenges and answer difficult questions set out at the start of the process. Our final piece in this series will bring together the entire ML Lifecycle.