Eclipse PanEval

Scope

<p>Eclipse PanEval provides a unified, vendor-neutral framework to evaluate AI models for capability, safety, and cybersecurity in line with EU regulations.</p><p>In-scope:<br>- A three-dimensional evaluation framework based on "Capacity – Task – Metrics"<br>- Coverage of 4 major model categories: language, multimodal, and speech models<br>- Support for several evaluation tasks including task solving, coding, multi-turn QA, factuality, image-text QA, depth estimation, speech perception, and more<br>- Safety &amp; robustness evaluation as a cross-cutting dimension across all model types<br>- Alignment with EU AI Act and CRA compliance requirements<br>- AI-assisted subjective evaluation to improve efficiency and objectivity<br>- Open leaderboard and evaluation platform (https://flageval.baai.ac.cn)</p><p>Out-of-scope:<br>- Model training or fine-tuning<br>- Deployment infrastructure for production AI systems<br>- Legal compliance certification (Eclipse PanEval provides evaluation tooling, not legal advice)</p>

Releases
Name Date
Reviews
Name Date
Creation Review 2026-05-06