Create external LLMs with code¶
The following, designed for use with DataRobot Notebooks, outlines how you can build and validate an external LLM using the DataRobot Python client. DataRobot recommends downloading this notebook and uploading it for use in the platform.
Note: For self-managed users, code samples that reference app.datarobot.com
need to be changed to the appropriate URL for your instance.
Setup¶
The following steps outline the configuration necessary for integrating an external LLM with the DataRobot platform.
Verify that the following feature flags are enabled. Contact your DataRobot representative or administrator for information on enabling these features.
- Enable Notebooks Filesystem Management
- Enable Proxy models
- Enable Public Network Access for all Custom Models
- Enable the Injection of Runtime Parameters for Custom Models
- Enable Monitoring Support for Generative Models
- Enable Custom Inference Models
Create a new credential in the DataRobot Credentials Management tool:
- Set it as an "API Token" type credential.
- Set the display name as
OPENAI_API_KEY
. - Place your OpenAI API key in the Token field.
Add the notebook environment variables
OPENAI_API_BASE
,OPENAI_API_KEY
,OPENAI_API_VERSION
, andOPENAI_DEPLOYMENT_NAME
; set the values with your Azure OpenAI credentials.Set the notebook session timeout to 180 minutes.
Install libraries¶
Install the following libraries:
!pip install "langchain==0.0.244" \
"openai==0.27.8" \
"datarobotx==0.1.25"
import datarobot as dr
import datarobotx as drx
from datarobot.models.genai.custom_model_llm_validation import CustomModelLLMValidation
Connect to DataRobot¶
Read more about different options for connecting to DataRobot from the Python client.
endpoint = "https://app.datarobot.com/api/v2"
token="<ADD_VALUE_HERE>"
dr.Client(endpoint=endpoint, token=token)
drx.Context(token=token, endpoint=endpoint)
Define hooks for deploying a text generation custom model¶
The following cell defines the methods used to deploy a text generation custom model. These include loading the custom model and using the model for scoring.
import os
import pandas as pd
OPENAI_API_BASE = os.environ.get('OPENAI_API_BASE', "<ADD_VALUE_HERE>")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "<ADD_VALUE_HERE>")
OPENAI_API_TYPE = os.environ.get('OPENAI_API_TYPE', "azure")
OPENAI_API_VERSION = os.environ.get('OPENAI_API_VERSION', "<ADD_VALUE_HERE>")
OPENAI_DEPLOYMENT_NAME = os.environ.get('OPENAI_DEPLOYMENT_NAME', "<ADD_VALUE_HERE>")
PROMPT_COLUMN_NAME = "prompt"
COMPLETION_COLUMN_NAME = "completion"
ERROR_COLUMN_NAME = "error"
def load_model(*args, **kwargs):
"""Custom model hook for loading our LLM."""
import os
from langchain.chat_models import AzureChatOpenAI
try:
import datarobot_drum as drum
api_key = drum.RuntimeParameters.get("OPENAI_API_KEY")["apiToken"]
except Exception:
api_key = os.environ.get('OPENAI_API_KEY', '<ADD_VALUE_HERE>')
llm = AzureChatOpenAI(
deployment_name=OPENAI_DEPLOYMENT_NAME,
openai_api_type=OPENAI_API_TYPE,
openai_api_base=OPENAI_API_BASE,
openai_api_version=OPENAI_API_VERSION,
openai_api_key=api_key,
model_name=OPENAI_DEPLOYMENT_NAME,
temperature=0.4,
verbose=True,
max_retries=0,
request_timeout=20
)
return llm
def score(data: pd.DataFrame, model, **kwargs):
"""Custom model hook for making predictions with our llm.
When requesting predictions from the deployment,
pass a pandas DataFrame with the PROMPT_COLUMN_NAME column:
datarobot-user-models (DRUM) handles loading the model and calling
this function with the appropriate parameters.
"""
import pandas as pd
llm = model
completions = []
errors = []
prompts = data[PROMPT_COLUMN_NAME].tolist()
for prompt in prompts:
completion = None
error = None
try:
completion = llm.predict(prompt)
except Exception as e:
error = f"{e.__class__.__name__}: {str(e)}"
completions.append(completion)
errors.append(error)
return pd.DataFrame({PROMPT_COLUMN_NAME: prompts, COMPLETION_COLUMN_NAME: completions, ERROR_COLUMN_NAME: errors})
Test hooks locally¶
Before proceeding with the deployment, use the cells below to test that the custom model hooks function correctly.
import pandas as pd
# Test the hooks locally
score(
pd.DataFrame(
{
PROMPT_COLUMN_NAME: ["What is a large language model (LLM)?"],
}
),
load_model()
)
Deploy the LLM¶
The cell below uses a convenience method that does the following:
- Deploys a text generation external model (LLM) to DataRobot.
- Returns an object that can be used to make predictions.
This example uses a pre-built environment.
You can also provide an environment_id
and instead use an existing custom model environment for shorter iteration cycles on the custom model hooks.
See your account's existing pre-built environments from the model workshop.
deployment = drx.deploy(
model=None,
name="External Azure OpenAI LLM",
hooks={
"score": score,
"load_model": load_model
},
runtime_parameters=["OPENAI_API_KEY"],
extra_requirements=["langchain==0.0.244", "openai==0.27.8"],
environment_id=dr.ExecutionEnvironment.list("Python 3.9 GenAI")[0].id,
target_type="TextGeneration",
target=COMPLETION_COLUMN_NAME
)
Test the deployment¶
Test that the deployment can successfully provide responses to prompts.
deployment.predict(
pd.DataFrame({
PROMPT_COLUMN_NAME: [
"Give me some context on large language models and their applications?",
"What is AutoML?"
],
})
)
Validate the external LLM¶
These methods execute and validate the external LLM.
This example associates a Use Case with the validation and creates the vector database within that Use Case.
Set the use_case_id
to specify an existing Use Case or create a new one with that name.
use_case_id = "<ADD_VALUE_HERE>"
use_case = dr.UseCase.get(use_case_id)
# UNCOMMENT if you wish to create a new UseCase
#use_case = dr.UseCase.create()
CustomModelLLMValidation.create
executes the validation of the external LLM. Be sure to provide the deployment ID.
external_llm_validation = CustomModelLLMValidation.create(
prompt_column_name=PROMPT_COLUMN_NAME,
target_column_name=COMPLETION_COLUMN_NAME,
deployment_id=deployment.dr_deployment.id,
name="My External LLM",
use_case=use_case,
wait_for_completion=True
)
assert external_llm_validation.validation_status == "PASSED"
print(f"External LLM Validation ID: {external_llm_validation.id}")
This external LLM can now be used in the GenAI E2E walkthrough, for example to create the LLM blueprint.