ML Health: Fitness Tracking for Production Models

02/07/2019
by   Sindhu Ghanta, et al.
0

Deployment of machine learning (ML) algorithms in production for extended periods of time has uncovered new challenges such as monitoring and management of real-time prediction quality of a model in the absence of labels. However, such tracking is imperative to prevent catastrophic business outcomes resulting from incorrect predictions. The scale of these deployments makes manual monitoring prohibitive, making automated techniques to track and raise alerts imperative. We present a framework, ML Health, for tracking potential drops in the predictive performance of ML models in the absence of labels. The framework employs diagnostic methods to generate alerts for further investigation. We develop one such method to monitor potential problems when production data patterns do not match training data distributions. We demonstrate that our method performs better than standard "distance metrics", such as RMSE, KL-Divergence, and Wasserstein at detecting issues with mismatched data sets. Finally, we present a working system that incorporates the ML Health approach to monitor and manage ML deployments within a realistic full production ML lifecycle.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2019

MPP: Model Performance Predictor

Operations is a key challenge in the domain of machine learning pipeline...
research
08/31/2021

Towards Observability for Machine Learning Pipelines

Software organizations are increasingly incorporating machine learning (...
research
11/26/2021

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

With the increasing adoption of machine learning (ML) models and systems...
research
03/03/2020

Model Assertions for Monitoring and Improving ML Models

ML models are increasingly deployed in settings with real world interact...
research
03/03/2020

Model Assertions for Monitoring and Improving ML Model

ML models are increasingly deployed in settings with real world interact...
research
09/16/2021

Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

Cloud networks are difficult to monitor because they grow rapidly and th...

Please sign up or login with your details

Forgot password? Click here to reset