Ar xiv

Speech Turing Tests, Smart Routing, Pseudocode Agents

New research reveals no speech AI passes a Turing test, adaptive routing slashes LLM costs 82%, and pseudocode planning transforms agent reliability.

AI Research Roundup: Agent Behavioral Contracts, Autonomous Memory, and Certified Circuits

Three new papers tackle agent reliability through formal contracts, active knowledge acquisition for memory systems, and provably stable mechanistic interpretability.

Today in AI Research: Stable Agent Training, Compound AI Limits, and the Algorithm Trust Paradox

New papers tackle training collapse in agentic RL with a unified stabilization recipe, reveal when querying multiple models actually helps, and expose a paradox where LLMs claim to trust humans but bet on algorithms.

Programmatic GUI Agents, Faithful Chain-of-Thought, and the 1% KV Cache

Today's arXiv picks: a state-machine framework that makes GUI agents 12x cheaper, a training method that forces chain-of-thought to be honest, and a KV cache system that matches full quality at 1% the memory.

Today in AI Research: Sycophancy Traps, Data Analysis Mirages, and the Scaling Wall for Agents

New papers show chatbot sycophancy causes delusional spiraling even in rational users, AI data analysts produce wildly different conclusions from the same dataset, and test-time scaling fails for general-purpose agents.

This Week in AI Research: When Agents Try Science, Smarter Synthetic Data, and Protecting Your Model's Reasoning

Three papers that matter this week: a brutal benchmark for AI research agents, a feature-space approach to training data diversity, and trace rewriting to stop model theft.

← Previous