Evaluator Migration Guide
Langfuse has introduced running evaluators on observations as the recommended approach for live data LLM-as-a-Judge evaluations. This guide helps you migrate existing live data evaluators to the new system.
Why Migrate?
Benefits of Observation-Level Evaluators
1. Better Performance
- Reduced database load enables faster evaluation processing
- Scales better under high-volume workloads
2. Improved Reliability
- More predictable behavior with evaluation targeting specific operations
- Better error handling and retry logic
3. Greater Control
- Evaluate specific observations (LLM calls, tool invocations, etc.) rather than entire traces
- More precise filtering
- Easier debugging when evaluations fail
4. Future-Proof
- Built on Langfuse’s next-generation evaluation architecture
Understanding the Trade-offs
We recognize this migration may require work on your end. Here’s our perspective:
- You can keep running evaluators on traces: They will continue to work for the foreseeable future
- Some users benefit more than others: High-volume users or those with complex traces will see the biggest improvements
- This enables long-term improvements: The architectural change allows us to build better, simpler features for everyone
- We’re here to help: Use the built-in migration wizard and this guide
When to Migrate
✅ Migrate Now If:
- You are running on or planning to upgrade to SDK >= 4.4.0 (JavaScript) or >= 3.9.0 (Python)
- You are experiencing performance issues with current evaluators
- You are setting up new evaluators and want the best experience
⏸️ Wait If:
- You are blocked from upgrading to SDK >= 4.4.0 (JavaScript) or >= 3.9.0 (Python)
- Your current evaluators perfectly for your use case
Migration Process
Step 1: Check Your SDK Version
Ensure you’re running a compatible SDK version:
pip show langfuse
# Required: >= 3.9.0To upgrade:
pip install --upgrade langfuseStep 2: Use the Upgrade Wizard
Langfuse provides a built-in wizard to migrate your evaluators.
-
Navigate to your evaluators page
- Go to your project → Evaluation → LLM-as-a-Judge
- You’ll see a callout for evaluators marked “Legacy”
-
Click “Upgrade” on any legacy evaluator
- This opens the migration wizard
- The wizard shows your current configuration on the left
-
Review the migrated configuration
- Left side: Your current (legacy) configuration (read-only)
- Right side: Proposed configuration (editable)
-
Adjust the new configuration
- Filters: Add filters to narrow down the evaluation to a specific subset of data you’re interested in (
observation type,trace name,trace tags,userId,sessionId,metadataetc.) - Variable Mapping: Map variables from observation fields (input, output, metadata) to your evaluation prompt
- Filters: Add filters to narrow down the evaluation to a specific subset of data you’re interested in (
-
Choose what happens to the old evaluator
- Keep both active: Test the new evaluator alongside the old one
- Mark old as inactive (recommended initially): Old evaluator stops running, new one takes over
- Delete old evaluator: Permanently remove the legacy evaluator
Step 3: Verify Evaluator Execution
Verify the new evaluator works correctly:
-
Check execution metrics
- Go to Evaluator Table → find new evaluator row → click “Logs”
- View execution logs
-
Compare results (if running both)
- Review scores from both legacy and new evaluators. You might find our score analytics helpful to compare the results.
- Ensure consistency in evaluation logic
Migration Examples
Example 1: Simple Trace Evaluator
Likely, your trace input/output is equivalent to a observation’s input/output within that same trace. Your evaluator should now target this observation directly. In this example, let’s assume you have a generation observation named “chat-completion” that holds the same input/output as your trace.
Before (Trace-level):
Target: Traces
Filter: trace.name = "chat-completion"
Variables:
- user_query: trace.input
- assistant_response: trace.outputAfter (Observation-level):
Target: Observations
Filter: trace.name = "chat-completion" AND observation.type = "generation" AND observation.name = "chat-completion"
Variables:
- user_query: observation.input
- assistant_response: observation.outputKey Changes:
- Additional filters at observation level to identify the specific observation you want to evaluate in the trace tree
- Variables come from observation instead of trace (e.g.
observation.inputandobservation.output)
Troubleshooting
Variables Don’t Map Correctly
Problem: You were mapping variables from two different observations
Solution:
- If possible, store necessary context in a single observation metadata during instrumentation
- Consider breaking your single trace evaluator into multiple observation evaluators
- Do not migrate your evaluator now. We do not yet have a translation for the new system, but are actively working on it.
SDK Version-Specific Guidance
For Users on Old SDK Versions (Python < 3.9.0, JavaScript < 4.4.0)
You have two options:
Option 1: Upgrade Your SDK (Recommended)
- Update to latest SDK version
- Migrate evaluators using the wizard
Option 2: Continue with Evaluators Running on Traces
- No changes needed
- Evaluators will continue to work
- Note: You may continue using trace evaluators on the new SDK version, but you will not get performance improvements.
For Users on New SDK Versions (Python >= 3.9.0, JavaScript >= 4.4.0)
If you have existing evaluators running on traces:
Option 1: Migrate to Observations (Recommended)
- Follow the migration wizard
- Get full benefits of new architecture
Getting Help
- Documentation: Refer to LLM-as-a-Judge guide
- GitHub: Report issues at github.com/langfuse/langfuse
- Support: Contact support@langfuse.com for enterprise customers
Rollback Plan
If you need to revert after migration:
- If you kept both evaluators: Simply mark the new one as inactive
- If you deleted the old evaluator: Create a new evaluator with the old configuration
- Data is preserved: All historical evaluation results remain accessible
Last updated: January 29, 2026