Simplify the deployment and monitoring of foundation models with DataRobot MLOps

Large language models, also known as basis models, have gained significant popularity in the field of machine learning. These models are pre-trained on large data sets, allowing them to perform well on a variety of tasks without requiring as much training data. Learn how you can easily deploy a pre-built foundation model using DataRobot MLOps capabilities, then release the model into production. By leveraging the power of a pre-trained model, you can save time and resources while achieving high performance in your machine learning applications.

What are the major language models?

Building foundation models is one of the major developments in the field of large language models that is generating a lot of excitement and interest among data scientists and machine learning engineers. These models are trained on massive amounts of textual data using deep learning algorithms. They can produce human-like language that is coherent and relevant in a given context, as well as process and understand natural language at a level previously thought impossible. As a result, they have the potential to revolutionize the way we interact with machines and solve a wide range of machine learning problems.

These developments have allowed researchers to create models that can perform a wide range of natural language processing tasks, such as machine translation, summarization, question answering, and even dialog generation. They can also be used for creative tasks, such as creating realistic text that can be useful for various applications, such as creating product descriptions or creating news articles.

Overall, recent developments in large language models are very exciting and have the potential to greatly improve our ability to solve machine learning problems and interact with machines in a more natural and intuitive way.

Start with language models using a hug face

As many machine learning practitioners already know, one easy way to get started with language models is to use Hug Face. The Hugging Face model hub is a platform that offers a collection of pre-built models that can be easily downloaded and used for a wide range of natural language processing tasks.

To get started with the language model from the Hugging Face model center, you just need to install the Hugging Face library into your local notebook or DataRobot Notebooks, if that’s what you’re using. If you already run your experiments on the DataRobot GUI, you can even add it as a custom task.

After installation, you can choose the model that suits your needs. You can then use the model to perform tasks such as text generation, classification, and translation. Models are easy to use and can be customized to your specific needs, making them a powerful tool for solving a variety of natural language processing problems.

If you don’t want to create a local runtime environment, you can start from Google Colab notebook in the CPU/GPU/TPU runtime, download your model and get model predictions in just a few lines.

As an example, starting the BERT model to answer the questions (bert-large-uncased-whole-word-masking-finetuned-squad) It’s as easy as doing these lines.

!pip install transformers==4.25.1
from transformers import AutoTokenizer, TFBertForQuestionAnswering
MODEL = "bert-large-uncased-whole-word-masking-finetuned-squad"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = TFBertForQuestionAnswering.from_pretrained(MODEL)

Implementation of language models in production

After trying out some mockups, possibly fine-tuning them for your specific use cases, and getting them ready for production, you’ll need a server environment to host your artifacts. More than just an environment to serve a model, you need to monitor its performance, health, flow of data and predictions, and an easy way to train it without disrupting your production workflows and your downstream applications consuming the output of your model.

This is where DataRobot MLOps comes into play. DataRobot MLOps services provide a platform for hosting and deploying custom model packages in various ML frameworks, such as PyTorch, Tensorflow, ONNX, and sk-learn, allowing organizations to easily integrate their pre-built models into their existing applications and use them for their business. the needs.

To deploy a pre-trained language model on DataRobot MLOps services, you simply upload the model to the platform, create its runtime environment with your custom dependency packages, and deploy it to the DataRobot servers. Your deployment will be ready in a few minutes, then you can send your prediction requests to your deployment endpoint and enjoy your model in production.

While you can perform all of these operations from the DataRobot UI, here we’ll show you how to implement an end-to-end workflow using the Datarobot API in a notebook. So let’s get started.

You can follow this tutorial by creating a new Google Colab notebook or copying our notebook from our DataRobot community repository and running the copied notebook in Google Colab.

Install the dependencies

!pip install transformers==4.25.1 datarobot==3.0.2
from transformers import AutoTokenizer, TFBertForQuestionAnswering
import numpy as np

Download the BERT model from HuggingFace in a laptop environment

MODEL = "bert-large-uncased-whole-word-masking-finetuned-squad"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = TFBertForQuestionAnswering.from_pretrained(MODEL)
BASE_PATH = "/content/datarobot_blogpost"

Install on DataRobot

Create the conclusion (glue) script ie. file.

This inference script ( file) acts as the glue between your model artifacts and the execution of the Custom Model in DataRobot. If this is your first time building a custom model in DataRobot MLOps, our public repository is a great starting point, with many examples of model templates from different ML frameworks and for different types of models, such as binary or multiclass classification. , regression, anomaly detection, or unstructured models like the one we will create in our example.

%%writefile $BASE_PATH/

Copyright 2021 DataRobot, Inc. and its affiliates.
All rights reserved.
This is proprietary source code of DataRobot, Inc. and its affiliates.
Released under the terms of DataRobot Tool and Utility Agreement.
import json
import os.path
import os
import tensorflow as tf
import pandas as pd
from transformers import AutoTokenizer, TFBertForQuestionAnswering
import io

def load_model(input_dir):
   tokenizer = AutoTokenizer.from_pretrained(input_dir)
   tf_model = TFBertForQuestionAnswering.from_pretrained(
       input_dir, return_dict=True
   return tf_model, tokenizer

def log_for_drum(msg):
   os.write(1, f"\n{msg}\n".encode("UTF-8"))

def _get_answer_in_text(output, input_ids, idx, tokenizer):
   answer_start = tf.argmax(output.start_logits, axis=1).numpy()[idx]
   answer_end = (tf.argmax(output.end_logits, axis=1) + 1).numpy()[idx]
   answer = tokenizer.convert_tokens_to_string(
   return answer

def score_unstructured(model, data, query, **kwargs):
   global model_load_duration
   tf_model, tokenizer = model

   # Assume batch input is sent with mimetype:"text/csv"
   # Treat as single prediction input if no mimetype is set
   is_batch = kwargs["mimetype"] == "text/csv"

   if is_batch:
       input_pd = pd.read_csv(io.StringIO(data), sep="|")
       input_pairs = list(zip(input_pd["abstract"], input_pd["question"]))

       start = time.time()
       inputs = tokenizer.batch_encode_plus(
           input_pairs, add_special_tokens=True, padding=True, return_tensors="tf"
       input_ids = inputs["input_ids"].numpy()
       output = tf_model(inputs)
       responses = []
       for i, row in input_pd.iterrows():
           answer = _get_answer_in_text(output, input_ids[i], i, tokenizer)
           response = {
               "abstract": row["abstract"],
               "question": row["question"],
               "answer": answer,
       pred_duration = time.time() - start
       to_return = json.dumps(
               "predictions": responses,
               "pred_duration": pred_duration,
       data_dict = json.loads(data)
       abstract, question = data_dict["abstract"], data_dict["question"]
       start = time.time()
       inputs = tokenizer(
       input_ids = inputs["input_ids"].numpy()[0]
       output = tf_model(inputs)
       answer = _get_answer_in_text(output, input_ids, 0, tokenizer)
       pred_duration = time.time() - start
       to_return = json.dumps(
               "abstract": abstract,
               "question": question,
               "answer": answer,
               "pred_duration": pred_duration,
   return to_return

Create the requirements file

%%writefile $BASE_PATH/requirements.txt


Upload model artifacts and inference script to DataRobot

import datarobot as dr
def deploy_to_datarobot(folder_path, env_name, model_name, descr):
 API_TOKEN = "YOUR_API_TOKEN" #Please refer to to get your token
 dr.Client(token=API_TOKEN, endpoint="")
 onnx_execution_env = dr.ExecutionEnvironment.list(search_for=env_name)[0]
 custom_model = dr.CustomInferenceModel.create(
 print(f"Creating custom model version on {onnx_execution_env}...")
 model_version = dr.CustomModelVersion.create_clean(,,
     maximum_memory=4096 * 1024 * 1024,
 print(f"Created {model_version}.")

 versions = dr.CustomModelVersion.list(
 sorted_versions = sorted(versions, key=lambda v: v.label)
 latest_version = sorted_versions[-1]
 print("Building the execution environment with dependency packages...")
 build_info = dr.CustomModelVersionDependencyBuild.start_build(,,
 print(f"Environment build completed with {build_info.build_status}.")

 print("Creating model deployment...")
 default_prediction_server = dr.PredictionServer.list()[0]
 deployment = dr.Deployment.create_from_custom_model_version(,
  print(f"{deployment} is ready!")
 	 return deployment

Create a model deployment

deployment = deploy_to_datarobot(BASE_PATH,
                                "Pretrained BERT model, fine-tuned on SQUAD for question answering")

Experimenting with prediction queries

The following script is designed to make predictions against your deployment, and you can grab the same script by opening your DataRobot account by going to Placements: tab, opening the deployment you just created, navigating to Forecasts tab, then open it Prediction API Scripting Code -> Single Section:

It will look like the example below where you will see your own API_KEY and DATAROBOT_KEY populated.

   python <input-file> [mimetype] [charset]

This example uses the requests library which you can install with:
   pip install requests
We highly recommend that you update SSL certificates with:
   pip install -U urllib3[secure] certifi
import sys
import json
import requests

API_URL = '{deployment_id}/predictionsUnstructured'

# Don't change this. It is enforced server-side too.
class DataRobotPredictionError(Exception):
   """Raised if there are issues getting predictions from DataRobot"""
def make_datarobot_deployment_unstructured_predictions(data, deployment_id, mimetype, charset):
   Make unstructured predictions on data provided using DataRobot deployment_id provided.
   See docs for details:

   data : bytes
       Bytes data read from provided file.
   deployment_id : str
       The ID of the deployment to make predictions with.
   mimetype : str
       Mimetype describing data being sent.
       If mimetype starts with 'text/' or equal to 'application/json',
       data will be decoded with provided or default(UTF-8) charset
       and passed into the 'score_unstructured' hook implemented in provided with the model.

       In case of other mimetype values data is treated as binary and passed without decoding.
   charset : str
       Charset should match the contents of the file, if file is text.

   data : bytes
       Arbitrary data returned by unstructured model.

   DataRobotPredictionError if there are issues getting predictions from DataRobot
   # Set HTTP headers. The charset should match the contents of the file.
   headers = {
       'Content-Type': '{};charset={}'.format(mimetype, charset),
       'Authorization': 'Bearer {}'.format(API_KEY),
       'DataRobot-Key': DATAROBOT_KEY,

   url = API_URL.format(deployment_id=deployment_id)

   # Make API request for predictions
   predictions_response =
   # Return raw response content
   return predictions_response.content

def _raise_dataroboterror_for_status(response):
   """Raise DataRobotPredictionError if the request fails along with the response returned"""
   except requests.exceptions.HTTPError:
       err_msg = '{code} Error: {msg}'.format(
           code=response.status_code, msg=response.text)
       raise DataRobotPredictionError(err_msg)

def datarobot_predict_file(filename, deployment_id, mimetype="text/csv", charset="utf-8"):
   Return an exit code on script completion or error. Codes > 0 are errors to the shell.
   Also useful as a usage demonstration of
   `make_datarobot_deployment_unstructured_predictions(data, deployment_id, mimetype, charset)`
   data = open(filename, 'rb').read()
   data_size = sys.getsizeof(data)
                 'Input file is too large: {} bytes. '
                 'Max allowed size is: {} bytes.'
             ).format(data_size, MAX_PREDICTION_FILE_SIZE_BYTES))
       return 1
       predictions = make_datarobot_deployment_unstructured_predictions(data, deployment_id, mimetype, charset)
       return predictions
   except DataRobotPredictionError as exc:
       return None

def datarobot_predict(input_dict, deployment_id, mimetype="application/json", charset="utf-8"):
   Return an exit code on script completion or error. Codes > 0 are errors to the shell.
   Also useful as a usage demonstration of
   `make_datarobot_deployment_unstructured_predictions(data, deployment_id, mimetype, charset)`
   data = json.dumps(input_dict).encode(charset)
   data_size = sys.getsizeof(data)
                 'Input file is too large: {} bytes. '
                 'Max allowed size is: {} bytes.'
             ).format(data_size, MAX_PREDICTION_FILE_SIZE_BYTES))
       return 1
       predictions = make_datarobot_deployment_unstructured_predictions(data, deployment_id, mimetype, charset)
       return json.loads(predictions)['answer']
   except DataRobotPredictionError as exc:
       return None

Now that we have an auto-generated script to make our predictions, it’s time to send a test prediction request. Let’s create a JSON to ask a question to our question-answering BERT model. For information, we will give a long abstract and the question based on this abstract.

test_input = {"abstract": "Healthcare tasks (e.g., patient care via disease treatment) and biomedical research (e.g., scientific discovery of new therapies) require expert knowledge that is limited and expensive. Foundation models present clear opportunities in these domains due to the abundance of data across many modalities (e.g., images, text, molecules) to train foundation models, as well as the value of improved sample efficiency in adaptation due to the cost of expert time and knowledge. Further, foundation models may allow for improved interface design (§2.5: interaction) for both healthcare providers and patients to interact with AI systems, and their generative capabilities suggest potential for open-ended research problems like drug discovery. Simultaneously, they come with clear risks (e.g., exacerbating historical biases in medical datasets and trials). To responsibly unlock this potential requires engaging deeply with the sociotechnical matters of data sources and privacy as well as model interpretability and explainability, alongside effective regulation of the use of foundation models for both healthcare and biomedicine.", "question": "Where can we use foundation models?"}


And see that our model returns the response in the model response as we expected.

> both healthcare and biomedicine

Easily control machine learning models with DataRobot MLOps

Now that we have our question-answering model successfully up and running, let’s take a look at our service health dashboard in DataRobot MLOps. When we send prediction requests to our model, Service Health tab will display newly received requests and allow us to track our model criteria.

Service Health Dashboard in DataRobot MLOps
Service Health Dashboard in DataRobot MLOps

Later, if we want to update our deployment with a newer version of a prebuilt model artifact or update our custom inference script, we again use the API or Custom Model Workshop UI to seamlessly make any necessary changes to our deployment.

Start using big language models

By hosting a language model with DataRobot MLOps, organizations can take advantage of the power and flexibility of large language models without worrying about the technical details of model management and application.

In this blog post, we showed how easy it is to host a large language model as a DataRobot custom model in just a few minutes by running an end-to-end script. You can find the end-to-end notebook in the DataRobot community repository, make a copy of it to edit for your needs, and accelerate your own production model.

About the writer

Aslı Sabancı Demiröz
Aslı Sabancı Demiröz

Senior Machine Learning Engineer, DataRobot

Aslı Sabancı Demiröz is a Senior Machine Learning Engineer at DataRobot. He holds a bachelor’s degree in computer engineering and a double major in control engineering from Istanbul Technical University. Working in the CTO’s office, he enjoys being at the heart of DataRobot’s R&D to drive innovation. His passion is in the deep learning space, and he particularly enjoys building powerful integration between the platform and application layers in the ML ecosystem, with the goal of making the whole greater than the sum of the parts.

Meet Asli Sabanji Demiroz

Source link