Research

Corrupt Agent Scores, Memory Bottlenecks, Skill Evolution

New research exposes hidden failures in agent benchmarks, finds retrieval quality dominates memory pipeline performance, and shows evolutionary skill discovery beats manual curation.

Cheaper Thinking, Web Traps, Denoised Agents

Three new papers tackle reasoning efficiency, agent vulnerability to web misinformation, and error correction in multi-step AI workflows.

Speech Turing Tests, Smart Routing, Pseudocode Agents

New research reveals no speech AI passes a Turing test, adaptive routing slashes LLM costs 82%, and pseudocode planning transforms agent reliability.

Anthropic: Better AI Output Means Worse Oversight

Anthropic's AI Fluency Index reveals that when Claude produces polished code and documents, users question its reasoning 5.6 times less often.

AI Research Roundup: Agent Behavioral Contracts, Autonomous Memory, and Certified Circuits

Three new papers tackle agent reliability through formal contracts, active knowledge acquisition for memory systems, and provably stable mechanistic interpretability.

Today in AI Research: Stable Agent Training, Compound AI Limits, and the Algorithm Trust Paradox

New papers tackle training collapse in agentic RL with a unified stabilization recipe, reveal when querying multiple models actually helps, and expose a paradox where LLMs claim to trust humans but bet on algorithms.

← Previous