cortex

module

v0.10.0 Latest Latest Go to latest Published: Nov 5, 2019 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cortexlabs/cortex

Links

Open Source Insights

README ¶

Deploy machine learning models in production

Cortex is an open source platform that takes machine learning models—trained with nearly any framework—and turns them into production web APIs in one command.

Demo

Key features

Autoscaling: Cortex automatically scales APIs to handle production workloads.
Multi framework: Cortex supports TensorFlow, PyTorch, scikit-learn, XGBoost, and more.
CPU / GPU support: Cortex can run inference on CPU or GPU infrastructure.
Rolling updates: Cortex updates deployed APIs without any downtime.
Log streaming: Cortex streams logs from deployed models to your CLI.
Prediction monitoring: Cortex monitors network metrics and tracks predictions.
Minimal configuration: Deployments are defined in a single cortex.yaml file.

Quickstart

Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to install Cortex on your AWS account before getting started.

Step 1: Define your deployment

The configuration below will download the model from the cortex-examples S3 bucket and deploy it as a web service that can serve real-time predictions.

# cortex.yaml

- kind: deployment
  name: text

- kind: api
  name: generator
  tensorflow:
    model: s3://cortex-examples/tensorflow/text-generator/gpt-2/124M
    request_handler: handler.py
  compute:
    gpu: 1

You can run the code that generated the model here.

Step 2: Add request handling

The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output as human-readable text.

# handler.py

from encoder import get_encoder
encoder = get_encoder()

def pre_inference(sample, signature, metadata):
    context = encoder.encode(sample["text"])
    return {"context": [context]}

def post_inference(prediction, signature, metadata):
    response = prediction["sample"]
    return encoder.decode(response)

Step 3: Deploy to AWS

cortex deploy takes the declarative configuration from cortex.yaml and creates it on the cluster.

$ cortex deploy

deployment started

You can track the status of a deployment using cortex get.

$ cortex get generator --watch

status   up-to-date   available   requested   last update   avg latency
live     1            1           1           8s            123ms

url: http://***.amazonaws.com/text/generator

Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.

Step 4: Serve real-time predictions

Once you have your endpoint, you can make requests.

$ curl http://***.amazonaws.com/text/generator \
    -X POST -H "Content-Type: application/json" \
    -d '{"text": "machine learning"}'

Machine learning, with more than one thousand researchers around the world today, are looking to create computer-driven machine learning algorithms that can also be applied to human and social problems, such as education, health care, employment, medicine, politics, or the environment...

Any questions? chat with us.

How Cortex works

The CLI sends configuration and code to the cluster every time you run cortex deploy. Each model is loaded from S3 into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), Flask, TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch.

More examples

Sentiment analysis in TensorFlow with BERT
Image classification in TensorFlow with Inception v3
Text Generation in PyTorch with Hugging Face's DistilGPT2
Iris classification in XGBoost / ONNX

Directories ¶

Path	Synopsis
cli
cmd
pkg
consts
lib/aws
lib/cast
lib/clusterconfig
lib/configreader
lib/console
lib/debug
lib/errors
lib/files
lib/hash
lib/json
lib/k8s
lib/maps
lib/msgpack
lib/parallel
lib/pointer
lib/prompt
lib/random
lib/regex
lib/sets/strset
lib/slices
lib/strings
lib/table
lib/telemetry
lib/time
lib/urls
lib/zip
operator
operator/api/context
operator/api/resource
operator/api/schema
operator/api/userconfig
operator/config
operator/context
operator/endpoints
operator/workloads

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL