Gpt 5

METR: Half of SWE-Bench Passes Fail Real Code Review

METR found maintainers would reject roughly half of AI PRs that pass SWE-bench automated grading, with a 24-point gap that suggests benchmark scores substantially overstate production readiness.

GPT-5.4

OpenAI's most capable frontier model combines native computer use, 1M-token context, and three variants at $2.50/$15 per million tokens.

GPT-5.3 Instant - OpenAI's Anti-Cringe Update

GPT-5.3 Instant launched March 3, 2026, cutting hallucinations by 26.8% and overhauling ChatGPT's tone - but with documented safety regressions in the process.

GPT-5.3 Instant Rolls Out to All ChatGPT Users

OpenAI ships GPT-5.3 Instant with 27% fewer hallucinations, a less preachy tone, and better web search - available now across all ChatGPT tiers and the API.

Grok 4 vs ChatGPT: Which AI Chatbot Wins in 2026?

A data-driven comparison of xAI's Grok 4 and OpenAI's ChatGPT powered by GPT-5.2, covering benchmarks, pricing, features, and real-world performance.

GPT-5.4 Leaked Twice in Codex Repo PRs - Here Is What We Know

Two pull requests in OpenAI's public Codex GitHub repo referenced GPT-5.4 before being scrubbed - one adding full-resolution vision support, the other a fast mode toggle. Seven force pushes and a deleted employee screenshot confirm this was not intentional.