Corrupt Agent Scores, Memory Bottlenecks, Skill EvolutionNew research exposes hidden failures in agent benchmarks, finds retrieval quality dominates memory pipeline performance, and shows evolutionary skill discovery beats manual curation.
AI Research Roundup: Agent Behavioral Contracts, Autonomous Memory, and Certified CircuitsThree new papers tackle agent reliability through formal contracts, active knowledge acquisition for memory systems, and provably stable mechanistic interpretability.