
Cheaper Thinking, Web Traps, Denoised Agents
Three new papers tackle reasoning efficiency, agent vulnerability to web misinformation, and error correction in multi-step AI workflows.

Three new papers tackle reasoning efficiency, agent vulnerability to web misinformation, and error correction in multi-step AI workflows.

New research reveals no speech AI passes a Turing test, adaptive routing slashes LLM costs 82%, and pseudocode planning transforms agent reliability.

Three new papers tackle agent reliability through formal contracts, active knowledge acquisition for memory systems, and provably stable mechanistic interpretability.

New papers tackle training collapse in agentic RL with a unified stabilization recipe, reveal when querying multiple models actually helps, and expose a paradox where LLMs claim to trust humans but bet on algorithms.

Today's arXiv picks: a state-machine framework that makes GUI agents 12x cheaper, a training method that forces chain-of-thought to be honest, and a KV cache system that matches full quality at 1% the memory.

New papers show chatbot sycophancy causes delusional spiraling even in rational users, AI data analysts produce wildly different conclusions from the same dataset, and test-time scaling fails for general-purpose agents.