Prompt Experiments

Where is this feature available?

Hobby
Public Beta
Pro
Public Beta
Team
Public Beta
Self Hosted
Enterprise Edition(Enterprise)

Prompt Experiments allows you to test a prompt version from Prompt Management on a Dataset of inputs and expected outputs. Thereby, you can verify that the change yields the expected outputs and does not cause regressions. You can directly analyze the results of different prompt experiments side-by-side.

Optionally, you can use LLM-as-a-Judge Evaluators to automatically evaluate the responses based on the expected outputs to further analyze the results on an aggregate level.

This is a no-code feature within Langfuse. You can run more complex experiments via the Langfuse SDKs/API. Follow this guide to get started.

Key benefits

Feedback loop: Quickly iterate on prompts by running experiments and directly comparing evaluation results side-by-side.
Regression prevention: When making prompt changes, run an experiment to make sure that the new version does not cause bad outputs.

Availability

Prompt Experiments is currently in public beta on Langfuse Cloud. It will be released for self-hosted users in Langfuse v3 (Pro plan) as it depends on parts of the new v3 infrastructure.

Overview

Introduction to Prompt Experiments

Setup

If you already have a dataset and a prompt, you can skip the following steps.

In Prompt Experiments, the items of a dataset are mapped to the variables of the prompt. In the following example, the variables (documentation and question) are mapped to the input of the dataset which is a JSON object. The expected output contains a reference answer for the given dataset item.

Configure LLM connection

Prompt Experiments runs LLM calls within Langfuse. Thus, you need to configure an LLM connection in the project settings.

Supported LLM providers

OpenAI, or OpenAI-compatible providers (e.g. LiteLLM, Google Vertex AI)
Anthropic
Azure OpenAI
AWS Bedrock

Create a dataset

Create a dataset with the inputs and expected outputs that you want to test your prompt on.

langfuse.create_dataset(
    name="<dataset_name>",
    # optional description
    description="My first dataset",
    # optional metadata
    metadata={
        "author": "Alice",
        "date": "2022-01-01",
        "type": "benchmark"
    }
)

See low-level SDK docs for details on how to initialize the Python client.

langfuse.createDataset({
  name: "<dataset_name>",
  // optional description
  description: "My first dataset",
  // optional metadata
  metadata: {
    author: "Alice",
    date: "2022-01-01",
    type: "benchmark",
  },
});

Datasets: + New dataset

Create dataset items with test cases

Dataset items include the input variables that should be inserted into the prompt.

Example Dataset Item with variables

input

{
  "question": "What is Langfuse?",
  "documentation": "Langfuse - the LLM Engineering Platform"
}

expected_output

Langfuse is the LLM Engineering Platform.

langfuse.create_dataset_item(
    dataset_name="<dataset_name>",
    # any python object or value, optional
    input={
        "text": "hello world"
    },
    # any python object or value, optional
    expected_output={
        "text": "hello world"
    },
    # metadata, optional
    metadata={
        "model": "llama3",
    }
)

See low-level SDK docs for details on how to initialize the Python client.

langfuse.createDatasetItem({
  datasetName: "<dataset_name>",
  // any JS object or value
  input: {
    text: "hello world",
  },
  // any JS object or value, optional
  expectedOutput: {
    text: "hello world",
  },
  // metadata, optional
  metadata: {
    model: "llama3",
  },
});

Datasets > Items: + New item

Create a prompt with variables

Use {{variables}} to insert the dataset variables into the prompt during experiments.

Example Prompt

system

You are a Langfuse expert. Please answer questions based on the following documentation:

DOCUMENTATION
{{documentation}}

user

{{question}}

# Create a text prompt
langfuse.create_prompt(
    name="movie-critic",
    type="text",
    prompt="As a {{criticlevel}} movie critic, do you like {{movie}}?",
    labels=["production"],  # directly promote to production
    config={
        "model": "gpt-3.5-turbo",
        "temperature": 0.7,
        "supported_languages": ["en", "fr"],
    },  # optionally, add configs (e.g. model parameters or model tools) or tags
)
 
# Create a chat prompt
langfuse.create_prompt(
    name="movie-critic-chat",
    type="chat",
    prompt=[
      { "role": "system", "content": "You are an {{criticlevel}} movie critic" },
      { "role": "user", "content": "Do you like {{movie}}?" },
    ],
    labels=["production"],  # directly promote to production
    config={
        "model": "gpt-3.5-turbo",
        "temperature": 0.7,
        "supported_languages": ["en", "fr"],
    },  # optionally, add configs (e.g. model parameters or model tools) or tags
)

If you already have a prompt with the same name, the prompt will be added as a new version.

// Create a text prompt
await langfuse.createPrompt({
  name: "movie-critic",
  type: "text",
  prompt: "As a {{criticlevel}} critic, do you like {{movie}}?",
  labels: ["production"], // directly promote to production
  config: {
    model: "gpt-3.5-turbo",
    temperature: 0.7,
    supported_languages: ["en", "fr"],
  }, // optionally, add configs (e.g. model parameters or model tools) or tags
});
 
// Create a chat prompt
await langfuse.createPrompt({
  name: "movie-critic-chat",
  type: "chat",
  prompt: [
    { role: "system", content: "You are an {{criticlevel}} movie critic" },
    { role: "user", content: "Do you like {{movie}}?" },
  ],
  labels: ["production"], // directly promote to production
  config: {
    model: "gpt-3.5-turbo",
    temperature: 0.7,
    supported_languages: ["en", "fr"],
  }, // optionally, add configs (e.g. model parameters or model tools) or tags
});

If you already have a prompt with the same name, the prompt will be added as a new version.

Run a prompt experiment

Now that we have set up a prompt version and a dataset, we can run a prompt experiment in Langfuse for each prompt version that we want to test.

When viewing the prompt details or a dataset, use the following button to run a prompt experiment:

New Experiment Button

Select the prompt version, dataset, and model configuration that you want to test. Before running the experiment, you will see whether the prompt variables match the dataset variables.

Prompt Experiments

Overview

Setup

Configure LLM connection

Create a dataset

Create dataset items with test cases

Create a prompt with variables

Run a prompt experiment

GitHub Discussions

Was this page useful?

Questions? We're here to help

Subscribe to updates