Scope
<p>Eclipse PanEval provides a unified, vendor-neutral framework to evaluate AI models for capability, safety, and cybersecurity in line with EU regulations.</p><p>In-scope:<br>- A three-dimensional evaluation framework based on "Capacity – Task – Metrics"<br>- Coverage of 4 major model categories: language, multimodal, and speech models<br>- Support for several evaluation tasks including task solving, coding, multi-turn QA, factuality, image-text QA, depth estimation, speech perception, and more<br>- Safety & robustness evaluation as a cross-cutting dimension across all model types<br>- Alignment with EU AI Act and CRA compliance requirements<br>- AI-assisted subjective evaluation to improve efficiency and objectivity<br>- Open leaderboard and evaluation platform (https://flageval.baai.ac.cn)</p><p>Out-of-scope:<br>- Model training or fine-tuning<br>- Deployment infrastructure for production AI systems<br>- Legal compliance certification (Eclipse PanEval provides evaluation tooling, not legal advice)</p>