Introduction to Langfuse Tracing

LLM applications use increasingly complex abstractions, such as chains, agents with tools, and advanced prompts. Nested traces in Langfuse help to understand what is happening and identify the root cause of problems.

Why Use LLM Observability & Tracing?

Full context capture: Track the complete execution flow including API calls, context, prompts, parallelism and more
Cost monitoring: Track model usage and costs across your application
Quality insights: Collect user feedback and identify low-quality outputs
Dataset creation: Build high-quality datasets for fine-tuning and testing
Root cause analysis: Quickly identify and debug issues in complex LLM applications

Why Langfuse?

Open source: Fully open source with public API for custom integrations
Production optimized: Designed with minimal performance overhead
Best-in-class SDKs: Native SDKs for Python and JavaScript
Framework support: Integrated with popular frameworks like OpenAI SDK, LangChain, and LlamaIndex
Multi-modal: Support for tracing text, images and other modalities
Full platform: Suite of tools for the complete LLM application development lifecycle

Traces allow to track every LLM call and other relevant logic in your app.

Add your own userId to monitor costs and usage for each user. Optionally, create a deep link to this view in your systems.

Introduction to Observability & Traces in Langfuse

A trace in Langfuse consists of the following objects:

A trace typically represents a single request or operation. It contains the overall input and output of the function, as well as metadata about the request, such as the user, the session, and tags.
Each trace can contain multiple observations to log the individual steps of the execution.
- Observations are of different types:
  - Events are the basic building blocks. They are used to track discrete events in a trace.
  - Spans represent durations of units of work in a trace.
  - Generations are spans used to log generations of AI models. They contain additional attributes about the model, the prompt, and the completion. For generations, token usage and costs are automatically calculated.
- Observations can be nested.

Hierarchical structure of traces in Langfuse

Example trace in Langfuse UI

Trace in Langfuse UI

Get Started

Follow the quickstart to add Langfuse tracing to your LLM app.

Quickstart Interactive demo

Advanced usage

You can extend the tracing capabilities of Langfuse by using the following features:

Sessions Multi-Modality Users Tags Metadata Releases & Versioning Trace URLs Export traces

Enable/disable tracing

All Langfuse SDKs and integrations are designed to be non-intrusive. You can add Langfuse tracing to your application while being able to enable it only in specific environments.

By default, the Langfuse Tracing is enabled if an API key is set. You can manually disable tracing via the enabled flag. See the documentation for the specific SDK or integration for more details.

Event queuing/batching

Langfuse’s client SDKs and integrations are all designed to queue and batch requests in the background to optimize API calls and network time. Batches are determined by a combination of time and size (number of events and size of batch).

Configuration

All integrations have a sensible default configuration, but you can customise the batching behaviour to suit your needs.

Option (Python) [SDK constructor, Environment]	Option (JS)	Description
`flush_at`, `LANGFUSE_FLUSH_AT`	`flushAt`	The maximum number of events to batch up before sending.
`flush_interval`, `LANGFUSE_FLUSH_INTERVAL` (s)	`flushInterval` (ms)	The maximum time to wait before sending a batch.

You can e.g. set flushAt=1 to send every event immediately, or flushInterval=1000 to send every second.

Manual flushing

This is especially relevant for short-lived applications like serverless functions. If you do not flush the client, you may lose events.

If you want to send a batch immediately, you can call the flush method on the client. In case of network issues, flush will log an error and retry the batch, it will never throw an exception.

# Decorator
from langfuse.decorators import langfuse_context
langfuse_context.flush()
 
# low-level SDK
langfuse.flush()

If you exit the application, use shutdown method to make sure all requests are flushed and pending requests are awaited before the process exits. On success of this function, no more events will be sent to Langfuse API.

langfuse.shutdown()

await langfuse.flushAsync();

If you exit the application, use shutdownAsync method to make sure all requests are flushed and pending requests are awaited before the process exits.

await langfuse.shutdownAsync();

If you run on Vercel, the waitUntil() method enqueues an asynchronous task to be performed during the lifecycle of the request. It doesn’t block the response, but should complete before shutting down the function. Thereby, it ensures that all events are flushed before the function exits without delaying the response.

npm i @vercel/functions

import { waitUntil } from "@vercel/functions";
 
export async function POST(req, res) {
  // your code involving langfuse
 
  // Flush events to Langfuse without blocking the response
  waitUntil(langfuse.flushAsync());
 
  // Send response to client
  return res.status(200).send("OK");
}

from langfuse.openai import openai
openai.flush_langfuse()

langfuse_handler.flush()

await langfuseHandler.flushAsync();