DocsDatasets & ExperimentsPrompt Experiments

Prompt Experiments

Where is this feature available?
  • Hobby
  • Pro
  • Team
  • Self Hosted
    (Enterprise)

Prompt Experiments allows you to test a prompt version from Prompt Management on a Dataset of inputs and expected outputs. Thereby, you can verify that the change yields the expected outputs and does not cause regressions. You can directly analyze the results of different prompt experiments side-by-side.

Optionally, you can use LLM-as-a-Judge Evaluators to automatically evaluate the responses based on the expected outputs to further analyze the results on an aggregate level.

This is a no-code feature within Langfuse. You can run more complex experiments via the Langfuse SDKs/API. Follow this guide to get started.

Key benefits
  • Feedback loop: Quickly iterate on prompts by running experiments and directly comparing evaluation results side-by-side.
  • Regression prevention: When making prompt changes, run an experiment to make sure that the new version does not cause bad outputs.
Availability

Prompt Experiments is currently in public beta on Langfuse Cloud. It will be released for self-hosted users in Langfuse v3 (Pro plan) as it depends on parts of the new v3 infrastructure.

Overview

Introduction to Prompt Experiments

Setup

If you already have a dataset and a prompt, you can skip the following steps.

In Prompt Experiments, the items of a dataset are mapped to the variables of the prompt. In the following example, the variables (documentation and question) are mapped to the input of the dataset which is a JSON object. The expected output contains a reference answer for the given dataset item.

Configure LLM connection

Prompt Experiments runs LLM calls within Langfuse. Thus, you need to configure an LLM connection in the project settings.

Supported LLM providers
  • OpenAI, or OpenAI-compatible providers (e.g. LiteLLM, Google Vertex AI)
  • Anthropic
  • Azure OpenAI
  • AWS Bedrock

Create a dataset

Create a dataset with the inputs and expected outputs that you want to test your prompt on.

langfuse.create_dataset(
    name="<dataset_name>",
    # optional description
    description="My first dataset",
    # optional metadata
    metadata={
        "author": "Alice",
        "date": "2022-01-01",
        "type": "benchmark"
    }
)

See low-level SDK docs for details on how to initialize the Python client.

Create dataset items with test cases

Dataset items include the input variables that should be inserted into the prompt.

Example Dataset Item with variables
input
{
  "question": "What is Langfuse?",
  "documentation": "Langfuse - the LLM Engineering Platform"
}
expected_output
Langfuse is the LLM Engineering Platform.
langfuse.create_dataset_item(
    dataset_name="<dataset_name>",
    # any python object or value, optional
    input={
        "text": "hello world"
    },
    # any python object or value, optional
    expected_output={
        "text": "hello world"
    },
    # metadata, optional
    metadata={
        "model": "llama3",
    }
)

See low-level SDK docs for details on how to initialize the Python client.

Create a prompt with variables

Use {{variables}} to insert the dataset variables into the prompt during experiments.

Example Prompt
system
You are a Langfuse expert. Please answer questions based on the following documentation:

DOCUMENTATION
{{documentation}}
user
{{question}}

Run a prompt experiment

Now that we have set up a prompt version and a dataset, we can run a prompt experiment in Langfuse for each prompt version that we want to test.

When viewing the prompt details or a dataset, use the following button to run a prompt experiment:

New Experiment Button

Select the prompt version, dataset, and model configuration that you want to test. Before running the experiment, you will see whether the prompt variables match the dataset variables.

GitHub Discussions

Was this page useful?

Questions? We're here to help

Subscribe to updates