Why ML model deployment can be painful
Quite often I hear questions about ML model deployment. Let me tell you, why I think, ML model deployment may be a pain in the ass.
Hey all, I was not writing for some time because had quite intensive events in real life. Now I am vaccinated, with a new house, and just turned 30. Nevertheless, here is my new blog post, especially for you.
"I made the app using Flask, containerized using Docker and deployed it to a Linux EC2" - this is the real quote I’ve seen. Like, what a big deal, right?
Wrong.
Now, let’s see why deployment is far more complex in general. But before that… Click the button below. Thanks. You are the best <3
When it comes to DL model deployment, things become shaky. I believe that happens mostly because of a wide variety of options and purposes. Deep learning is relatively new are, and most of its existence was focused on the research part - we were trying to create cool algorithms and make them technically accurate and robust. Now we succeeded in most areas, from computer vision to natural language processing - time to move on! From research, we are moving towards application. More solutions appear each month and the question of deployment is becoming the number one priority.
Just imagine, you can easily train YOLO v5 out of the box from the first link in google search and make it work with IP camera through RTSP protocol with just a few lines of code (assuming you have annotated data, but that’s also not a problem nowadays). However, you should think of how to properly deploy this model, so it can handle e.g. multiple users 24/7 in real time.
Now, this is just a single example, and there multiple different task-specific use-cases. This field is not mature yet, so we feel some (or sometimes a lot of) pain.
Trying to summarize all my thoughts, I would say that deployment is not only about the technical stack you use. There is also:
Serving part. What is your way of talking to the model? Does it have API? Versioning? How you store data that is analyzed by model, how you schedule users if it is the intensive heavy real-time analysis, etc.
Connected to the previous one - the monitoring part. Do you somehow log model mistakes (like user can press the button “wrong prediction” or smth similar). Do you log your hardware parameters? Can you detect model drift?
Communication part. In most cases in industries, you deal with some existing databases you should connect your model to. Or maybe there should be a connection with a specific machine, device. Now, security issues/standards are popping up. Communication protocols, infrastructure, and architecture. All flavors of pain in the ass.
Hardware part. Where your model should work? Cloud? Cloud on-premise? HPC server? Embedded devices like Jetson Xavier? Imagine there is a hospital and CT scanner with some PC, that is not even connected to the internet, and GPU/CPU is as old as my grandma. So, still a docker container with a flask?
* next one is not mine, but I like it
The next big one is authentication - you can deploy to a flask app on EC2, but what if you only want it to be available to users in your company? You could set up IP whitelisting and then require that your users connect through the VPN. But IPs can be spoofed, so maybe you need a single sign-on, which means mucking about with Cognito, etc.
To get deeper into deployment details I recommend looking through this Deployment lecture.
At my current job we are discussing the best way to proceed with deployment right now. Curious, what will we came up with in the end and hope it won’t take too long and soon we will deploy models like piece of cake.
Thanks all for reading. This was Adel, you personal ML engineer. <3
See you soon!