Evaluator Migration Guide

Langfuse has introduced running evaluators on observations as the recommended approach for live data LLM-as-a-Judge evaluations. This guide helps you migrate existing live data evaluators to the new system.

Why Migrate?

Benefits of Observation-Level Evaluators

1. Better Performance

Reduced database load enables faster evaluation processing
Scales better under high-volume workloads

2. Improved Reliability

More predictable behavior with evaluation targeting specific operations
Better error handling and retry logic

3. Greater Control

Evaluate specific observations (LLM calls, tool invocations, etc.) rather than entire traces
More precise filtering
Easier debugging when evaluations fail

4. Future-Proof

Built on Langfuse’s next-generation evaluation architecture

Understanding the Trade-offs

We recognize this migration may require work on your end. Here’s our perspective:

You can keep running evaluators on traces: They will continue to work for the foreseeable future
Some users benefit more than others: High-volume users or those with complex traces will see the biggest improvements
This enables long-term improvements: The architectural change allows us to build better, simpler features for everyone
We’re here to help: Use the built-in migration wizard and this guide

When to Migrate

✅ Migrate Now If:

You are running on or planning to upgrade to SDK >= 4.4.0 (JavaScript) or >= 3.9.0 (Python)
You are experiencing performance issues with current evaluators
You are setting up new evaluators and want the best experience

⏸️ Wait If:

You are blocked from upgrading to SDK >= 4.4.0 (JavaScript) or >= 3.9.0 (Python)
Your current evaluators perfectly for your use case

Migration Process

Step 1: Check Your SDK Version

Ensure you’re running a compatible SDK version:

pip show langfuse
# Required: >= 3.9.0

To upgrade:

pip install --upgrade langfuse

Step 2: Use the Upgrade Wizard

Langfuse provides a built-in wizard to migrate your evaluators.

Navigate to your evaluators page
- Go to your project → Evaluation → LLM-as-a-Judge
- You’ll see a callout for evaluators marked “Legacy”
Click “Upgrade” on any legacy evaluator
- This opens the migration wizard
- The wizard shows your current configuration on the left
Review the migrated configuration
- Left side: Your current (legacy) configuration (read-only)
- Right side: Proposed configuration (editable)
Adjust the new configuration
- Filters: Add filters to narrow down the evaluation to a specific subset of data you’re interested in (observation type, trace name, trace tags, userId, sessionId, metadata etc.)
- Variable Mapping: Map variables from observation fields (input, output, metadata) to your evaluation prompt
Choose what happens to the old evaluator
- Keep both active: Test the new evaluator alongside the old one
- Mark old as inactive (recommended initially): Old evaluator stops running, new one takes over
- Delete old evaluator: Permanently remove the legacy evaluator

Step 3: Verify Evaluator Execution

Verify the new evaluator works correctly:

Check execution metrics
- Go to Evaluator Table → find new evaluator row → click “Logs”
- View execution logs
Compare results (if running both)
- Review scores from both legacy and new evaluators. You might find our score analytics helpful to compare the results.
- Ensure consistency in evaluation logic

Migration Examples

Example 1: Simple Trace Evaluator

Likely, your trace input/output is equivalent to a observation’s input/output within that same trace. Your evaluator should now target this observation directly. In this example, let’s assume you have a generation observation named “chat-completion” that holds the same input/output as your trace.

Before (Trace-level):

Target: Traces
Filter: trace.name = "chat-completion"
Variables:
  - user_query: trace.input
  - assistant_response: trace.output

After (Observation-level):

Target: Observations
Filter:  trace.name = "chat-completion" AND observation.type = "generation" AND observation.name = "chat-completion"
Variables:
  - user_query: observation.input
  - assistant_response: observation.output

Key Changes:

Additional filters at observation level to identify the specific observation you want to evaluate in the trace tree
Variables come from observation instead of trace (e.g. observation.input and observation.output)

Troubleshooting

Variables Don’t Map Correctly

Problem: You were mapping variables from two different observations

Solution:

If possible, store necessary context in a single observation metadata during instrumentation
Consider breaking your single trace evaluator into multiple observation evaluators
Do not migrate your evaluator now. We do not yet have a translation for the new system, but are actively working on it.

SDK Version-Specific Guidance

For Users on Old SDK Versions (Python < 3.9.0, JavaScript < 4.4.0)

You have two options:

Option 1: Upgrade Your SDK (Recommended)

Update to latest SDK version
Migrate evaluators using the wizard

Option 2: Continue with Evaluators Running on Traces

No changes needed
Evaluators will continue to work
Note: You may continue using trace evaluators on the new SDK version, but you will not get performance improvements.

For Users on New SDK Versions (Python >= 3.9.0, JavaScript >= 4.4.0)

If you have existing evaluators running on traces:

Option 1: Migrate to Observations (Recommended)

Follow the migration wizard
Get full benefits of new architecture

Getting Help

Documentation: Refer to LLM-as-a-Judge guide
GitHub: Report issues at github.com/langfuse/langfuse
Support: Contact support@langfuse.com for enterprise customers

Rollback Plan

If you need to revert after migration:

If you kept both evaluators: Simply mark the new one as inactive
If you deleted the old evaluator: Create a new evaluator with the old configuration
Data is preserved: All historical evaluation results remain accessible

Last updated: January 29, 2026

API

Was this page helpful?

Support