Chain of thought

Reasoning Models Can't Hide Their Thinking - OpenAI Study

OpenAI's CoT-Control benchmark shows frontier reasoning models score 0.1-15.4% at steering their own chain of thought - a result the company frames as good news for AI oversight.

CoT Control, Hidden Beliefs, and Dynamic Agent Benchmarks

New research shows reasoning models can't suppress their chain-of-thought, that they commit to answers internally long before their CoT reveals it, and that static benchmarks are inadequate for measuring real-world agent adaptability.

Cheaper Thinking, Web Traps, Denoised Agents

Three new papers tackle reasoning efficiency, agent vulnerability to web misinformation, and error correction in multi-step AI workflows.

What Are AI Reasoning Models?

A plain-English guide to AI reasoning models - what they are, how they think step by step, and when you should actually use one.

Programmatic GUI Agents, Faithful Chain-of-Thought, and the 1% KV Cache

Today's arXiv picks: a state-machine framework that makes GUI agents 12x cheaper, a training method that forces chain-of-thought to be honest, and a KV cache system that matches full quality at 1% the memory.