Eclipse PanEval

Scope

Eclipse PanEval provides a unified, vendor-neutral framework to evaluate AI models for capability, safety, and cybersecurity in line with EU regulations.

In-scope:
- A three-dimensional evaluation framework based on "Capacity – Task – Metrics"
- Coverage of 4 major model categories: language, multimodal, and speech models
- Support for several evaluation tasks including task solving, coding, multi-turn QA, factuality, image-text QA, depth estimation, speech perception, and more
- Safety & robustness evaluation as a cross-cutting dimension across all model types
- Alignment with EU AI Act and CRA compliance requirements
- AI-assisted subjective evaluation to improve efficiency and objectivity
- Open leaderboard and evaluation platform (https://flageval.baai.ac.cn)

Out-of-scope:
- Model training or fine-tuning
- Deployment infrastructure for production AI systems
- Legal compliance certification (Eclipse PanEval provides evaluation tooling, not legal advice)

Releases
Name Date
Reviews
Name Date
Creation Review 2026-05-06