Moving from research to production… with a Grin!

Post image

Written by Oded Krams | Algorithm Team Lead at Grin

Who loves a trip to the orthodontist? The coordination of schedules, a commute, waiting… all to have them review your teeth for a minute to assess the progress of treatment.

As part of the Grin® experience (which looks like this) the oral scans that pass through the Grin platform are automatically analyzed, and important information is extracted for our doctors and patients using specialized computer vision algorithms.

Like all AI developers, we need to make sure that the same algorithms we develop and test in our internal research lab will run just as smoothly and predictably on our production platform. In this article, I want to talk about that last step: the challenge of shifting from algorithm development to production.

Here’s why it’s hard

At Grin®, we have a web platform designed for our doctor clients to keep track of patient treatments. A key component of the treatment process, of course, is viewing video scans of the patient’s teeth. These video scans are recorded by patients themselves using the Grin App and the Grin Scope® from the convenience of home.

Following the upload of each video scan, our algorithm runs and processes the scan to deliver valuable insights to the doctor such as a video summary or machine-assisted treatment tracking. Like many AI-driven, video-based analysis companies, one of our ongoing challenges is scalability.

Currently, we have video scans uploaded to the platform approximately every four minutes – with some peaks and spikes during the day – and we plan to keep on growing. So we knew that we had to prepare our inference infrastructure to scale, which means automatically scaling our compute resources so it can scale up or ease down, on-demand. It had to be robust and reliable, and offer the flexibility to easily manage model updates.

Our workflow in a nutshell

The idea is quite simple. Each scan uploaded to the platform creates an algorithm job with the relevant configuration and adds this job to a job queue. A queue watcher keeps track of the number of jobs inside that queue, automatically creating and destroying algorithm workers depending on the number of jobs in the queue. These algorithm workers extract jobs from the job queue one by one until it is empty. When the job queue is empty, the queue watcher closes and destroys the algorithm workers until they are needed again.

This is a pretty standard framework for scalable job processing, as it is flexible and can expand or reduce in size as demand changes.

Getting it done with an ML-Ops platform

After searching for the right platform to support this solution, we found the ClearML platform. ClearML is a Python-based ML-Ops platform including several components. Initially, we were interested primarily in the jobs orchestration feature but then discovered that the experiment management platform also provides helpful functionality.

Integration of ClearML with our code was quite simple. Adding just two lines of code, we could see our task on the experiment management UI. Running tasks on a remote machine took a bit more installation work and configuration. We started an AWS EC2 instance and installed and configured a ClearML agent on this machine. Using the UI, we sent tasks to run on the EC2 machine. After this manual process, the running agent on the EC2 machine pulled the job from the relevant queue and ran it on the remote machine. We were very pleased with this initial integration, as it gave us the orchestration infrastructure without having to write it ourselves.

But now we wanted more: full end-to-end automation.

Autoscaling Instances

Fortunately, ClearML already has an AWS Autoscaler, ready to set up machines and queues. In the ClearML jargon, they call it a service: a task that runs continuously and keeps watch on events in the system. (Yes, this is exactly what we needed from the queue watcher discussed above).

We used ClearML to run the AWS Autoscaler service, which took just a few minutes to get up and running.

Now instead of spinning machines up and down, the AWS Autoscaler service automatically “sees” there is a job pending in a queue, creates and starts a relevant EC2 machine, and finally installs and configures all the execution dependencies. It was truly as efficient as it sounds; the workflow gave us (almost) full automation in running the algorithm job on remote machines.

Now all that was left was creating and enqueuing the task.

Integrating the AWS Lambda function

Our platform team, in charge of the Grin web app, works mostly with AWS Lambda functions to manage events coming from the web app. AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers.

One Lambda function is responsible for sending the scans that patients upload to the doctor’s application. Our plan was then to use a similar Lambda function to create and enqueue the tasks to ClearML.

Now, while AWS Lambda functions can run Python scripts easily, it’s a serverless platform, so we initially found it a bit difficult to install non-standard components like the ClearML Python package. The ClearML support team was on hand, ready to help us get it done. We implemented a Python script that uses the ClearML Python API to clone the algorithm task, change its configuration to use the new scan file, and enqueue it to the relevant queue of the AWS Autoscaler service. This script was successfully deployed on an AWS Lambda function which the platform team can call with a defined configuration for the new scan. Finally, we connected the start point of our automation process to the platform team to create a full, end-to-end automation process.

But what about monitoring? How would the platform team know that an algorithm job was finished? Or worse, what if it failed? For that, we need some task monitoring.

Closing the loop with Task Monitoring

It was time to add an independent, external service to monitor our production algorithm tasks and report when they were completed (or, of course, if there was a problem that stalled one). We didn’t want it to run internally (i.e.g, having each algorithm task itself report when it finishes or when it creates an error) because we were worried about unhandled exceptions: An algorithm task can fail unexpectedly, and with no outside monitoring, we’d never know! And of course, neither will the platform.

Luckily (again!), ClearML has a Python script that enables a Slack monitoring service. It runs continuously in the background, watches the running tasks, and reports specific events to a Slack channel. We wanted these same features but also needed them to report to the platform about the task event. We modified the script to send the message to a predefined endpoint URL (in our platform) to notify it of the algorithm task status. With this functionality in place, we finally had our fully automated, end-to-end procedure to run algorithm tasks from the platform, at scale, and with monitoring.

Each of these improvements has helped fill the gaps where we had a clear vision of our goals, but couldn’t easily create the optimal solutions to reach them. Going with what was clearly a best-of-breed, a purpose-built solution with a long list of features designed for ML developers saved us both time and money. And most importantly for a young company, ClearML has allowed us to stay confident about the quality and sharply focussed on our core product development.