Introduction
We have designed and implemented a flexible and scalable machine learning framework that allows a) rapid development of models offline, b) easy and reliable deployment of models into production, and c) supports multiple frameworks and algorithms. The framework provides a microservice that enables the serving of near real-time predictions from multiple models. Models can be updated with custom frequency. The prediction service is scalable and hence is robust to hardware failure and can grow to support more traffic. We also have extensive monitoring and alerting in place to ensure both model and prediction service performance.
General
- Generalized framework that makes it easy to create a new model by creating a JSON config and sub-classing a base class. Subclass needs to define methods that specify how to fetch the data, how to identify positive negative examples, etc. There are additional methods that can be used to generate custom evaluations on both the training and testing data.
- The framework currently supports Vowpal Wabbit(VW) and XGBoost machine learning frameworks under the hood.
- Features are enumerated in a json config file. The config file also allows:
- Feature buckets: by specifying bucket boundaries for features
- Specifying feature defaults for missing values: these can be absolute values or max, min, median, mean. The defaults are inferred from training data.
- Quadratic/cross features: ability to multiply two features to create a new feature. This includes crossing bucketed features.
- Excluding non-cross/quadratic features: when we want to use a feature only in a cross feature and not as a feature itself. For context, VW allows us to do quadratic features but doesn’t allow this exclusion.
- Ability to specify VW namespaces, and other command line flags.
- Dynamic Config: Create/calculate features/thresholds/values from training data that are used during testing as well as for predictions.
- Dynamic class generation that combines machine learning framework with the algorithm to use from it at runtime.
Model Development
- The framework trains models using various algorithms like linear and logistic regression, and calculates metrics like AUC/R^2 on the testing data. The training and testing run is launched with a runner script where the model, model version, framework and algorithm are specified through flags.
- We are able to pull data for training and testing from both Postgres and Snowflake. We specify the training and testing time periods to determine the data split between training and testing.
- Local caching: of sql results, of processed training and testing instances. These are very useful when developing a model.
- Feature statistics: a flag on the runner script gets summary statistics on features for both the training and testing data.
- Feature selection: ability to specify combinations of feature to cycle through to determine the best feature combination.
- Ability to override the config from the command line on the fly.
Production / Real Time Prediction Service
- We create a flask based microservice where we have an endpoint for each model and version combination. The main website application hits the endpoint with an id and gets back a list of recommendations ranked by scores. We also return metadata like model update time and the feature values used to calculate the scores.
- On the website application (client) side we store all of this information in a postgres table.
- Since VW uses a daemon for inference, we create systemd services for each model and version combination being used in production. Support for multiple versions allows A/B testing on different versions of the same model.
- The flask server also runs as a systemd service.
- Daily training of models to use most recent data for training. At the completion of the run we upload the model and the config file to S3 which can be picked up by other servers.
- SLA of ~ 1 second for current model. Monitoring and alerting through datadog.
- We use Datadog extensively to track microservice and model performance.
- Bootstrap script: responsible for setting up the machine learning framework on a new EC2 instance. The script can checkout and install any git branch of the machine learning repository.
- Deploy script: responsible for stopping and restarting all the systemd services. This script can also deploy any git branch which is very useful for testing. We use ship-it for regular deploys.
- Scalability: We use a load balancer that gives us the ability to distribute calls between multiple EC2 instances of the service. We also have cloudwatch alerting setup that alerts when the load balancer determines a box is down.
- We store daily trained models and their corresponding configs in the cloud. The setup script looks for the latest model and config and downloads them.
Future
- Rolling restarts: currently our traffic needs are met with just one EC2 instance, but as we deploy more models into production, we will need more EC2 instances. Model and code updates will then need rolling restarts on those boxes.
- Support for more machine learning libraries. Currently, we only use Vowpal Wabbit and XGBoost. We would like to extend support to libraries that enable us to build more complicated models like neural network models.
- We are looking into using the results of models as features for another model to power real time predictions. We have a lot of missing data at inference time and having models that predict that data should improve the predictive power of the higher level models.
- Ability to pull data from databases other than Postgres during inference.
Comments
Post a Comment