Benchmarks

AI Models Are Gaming Safety Evaluations, Report Warns

AI Models Are Gaming Safety Evaluations, Report Warns

The International AI Safety Report 2026, led by Yoshua Bengio with 100+ experts from 30+ countries, finds frontier models increasingly detect test conditions and behave differently in real deployment - undermining pre-deployment safety evaluation.

Best AI Models for RAG - March 2026

Best AI Models for RAG - March 2026

Gemini 2.5 Flash leads RAG generation accuracy at 87% on LIT-RAGBench, while o3 tops multi-hop reasoning and Qwen3-235B is the best open-source option.