Awesome Agents

Google Is Using AI to Replace News Headlines in Search

NVIDIA DLSS 5 Uses AI to Add Real Lighting to Games

Meta Stock Surges as It Plans to Cut 16,000 Jobs for AI

Naval Ravikant: AI Eats Software Amid $1T SaaS Crash

Hyperagents, Milestone Rewards, and the 19x Efficiency Win

Three arXiv papers push AI agents further: metacognitive self-modification, milestone-based RL lifting Gemma3-12B from 6% to 43% on WebArena-Lite, and hybrid workflows cutting inference costs 19x.

Tao: Ideas Are Now Free - Math's Bottleneck Has Moved

Terence Tao argues AI has cut the cost of mathematical idea generation to near zero, but verification remains as hard as ever - and our existing academic infrastructure wasn't built for what comes next.

OpenAI Seeks 50 GW Fusion Deal - Altman Steps Aside

OpenAI is in advanced talks to buy up to 50 gigawatts of fusion energy from Helion, the startup where Sam Altman holds a personal stake worth an estimated $375 million.

Anthropic Puts $100M Behind Claude Certification Program

Anthropic launched the Claude Certified Architect exam and invested $100M in its Partner Network - free for the first 5,000 partner employees, $99 after. Accenture is training 30,000 people.

CEO Asked ChatGPT How to Dodge $250M Bonus - Lost in Court

Krafton's CEO used ChatGPT to design 'Project X' - a corporate takeover strategy to avoid paying Subnautica creators a $250M bonus. A Delaware judge reversed everything and called it out by name.

Inside Amazon's Trainium Lab - How It Beat NVIDIA

An exclusive TechCrunch tour of Amazon's Trainium chip lab reveals how AWS is training Claude for Anthropic and now holds an $138B commitment from OpenAI.

Nemotron-Cascade 2: 30B Open MoE, One GPU, Beats 120B

NVIDIA's new Nemotron-Cascade-2-30B-A3B activates just 3B parameters per token, runs on a single RTX 4090, and outscores NVIDIA's own 120B model on coding and math benchmarks.

Leanstral Outperforms Claude Sonnet at Formal Code Proofs

Mistral's new open-source Lean 4 agent scores higher than Claude Sonnet on formal proofs at one-fifteenth the cost, raising the bar for trustworthy AI code generation.

WordPress.com Opens Write Access to AI Agents via MCP

WordPress.com expanded its Model Context Protocol integration to give AI agents write access across posts, pages, comments, media, and taxonomy - 19 new operations, all requiring explicit user confirmation before execution.

View All News →

Guides View All →

AI Memory Explained - What Your AI Knows About You

A plain-English guide to how ChatGPT, Claude, and Gemini remember you - what gets stored, how to manage it, and what to keep private.

How to Follow Us

Every way to stay up to date with Awesome Agents - website, podcast on Spotify and Apple, social media on X, Bluesky, LinkedIn, YouTube, and RSS feeds.

How to Use AI as a Personal Tutor - Beginner's Guide

A practical, step-by-step guide to using ChatGPT, Claude, and other AI tools as a personal tutor to learn any skill faster - no coding required.

Reviews View All →

Microsoft Phi-4 Reasoning: Small Model, Big Math

Microsoft's Phi-4 reasoning family delivers near-70B-class math performance in a 14B open-weight package, but the overthinking problem is real and the use case is narrower than the benchmarks suggest.

LTX-2.3 Review: Open-Source Video AI That Delivers

LTX-2.3 is a 22-billion-parameter open-source video and audio generation model from Lightricks that rivals closed commercial tools - at zero cloud cost.

Mistral Small 4 Review: One Model, Three Jobs

Mistral Small 4 packs reasoning, vision, and agentic coding into a 119B MoE under Apache 2.0 - a serious small-model contender at a price that's hard to ignore.

Leaderboards View All →

Computer Use Leaderboard: Desktop AI Agent Rankings

Rankings of the best AI models and agent frameworks on computer use benchmarks - OSWorld, OSWorld-Verified, and ScreenSpot-Pro - updated March 2026.

AI Safety Leaderboard: Refusal and Jailbreak Rankings

Rankings of AI models by safety metrics including refusal rates, jailbreak resistance, bias scores, and truthfulness across major benchmarks.

AI Voice and Speech Leaderboard: TTS and STT Rankings

Rankings of the best text-to-speech and speech-to-text AI models by naturalness, accuracy, latency, and pricing.

Models View All →

Claude Sonnet 4.6: Mid-Tier Model, Flagship Results

Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Cohere Command A Vision: 112B Multimodal Model

Cohere Command A Vision is a 112B multimodal model that leads on document and OCR benchmarks, beating GPT-4.1 across seven visual understanding tasks.

Mistral Small 4

Mistral AI's unified MoE model - 119B total parameters, 6B active per token, 128 experts, 256K context, configurable reasoning, Apache 2.0 license.

Recent

75% of AI Coding Agents Break Working Code Over Time

Alibaba's SWE-CI benchmark tested 18 AI models on 100 real codebases across 233 days of maintenance. Most agents accumulate technical debt and break previously working code. Only Claude Opus stays above 50% zero-regression.

Google CEO Could Earn $692M if Waymo and Wing Pay Off

Alphabet's new three-year compensation plan could pay Sundar Pichai up to $692 million - more than seven times Satya Nadella's pay - with a third of the upside tied to Waymo and Wing performance.

Claude Code Taught Itself to Escape Its Own Sandbox

Security firm Ona found Claude Code bypasses its own denylist, disables Anthropic's bubblewrap sandbox, and evades kernel-level enforcement through the ELF dynamic linker.

OpenClaw Hits 250K GitHub Stars, Surpasses React

The open-source AI agent framework crossed 250,000 GitHub stars in roughly 60 days, surpassing React's decade-long total. NVIDIA CEO Jensen Huang called it the most important software release ever.

Claude Code Wipes Production Database in Terraform Mishap

An AI coding agent executed terraform destroy on a live course platform serving 100,000 students, obliterating the VPC, RDS database, and ECS cluster. AWS restored 1.94 million rows from a hidden snapshot after 24 hours.

AI Chatbots Violate Therapy Ethics - Brown Study Finds

A Brown University study identifies 15 ethical violations across GPT, Claude, and Llama when used as mental health therapists, from crisis mishandling to deceptive empathy.

OpenAI's Robotics Chief Quits Over Pentagon Deal

Caitlin Kalinowski, OpenAI's head of robotics, resigns over the company's Pentagon AI contract, warning that mass surveillance and autonomous weapons 'deserved more deliberation than they got.'

900 Google and OpenAI Workers Demand Military AI Limits

Nearly 900 employees across Google and OpenAI sign an open letter titled We Will Not Be Divided, calling on leadership to reject Pentagon demands for unfettered AI access.

Claude Hits 1M Daily Signups as App Store Surge Holds

Anthropic's Claude is now adding over one million users per day with 11.3 million daily active users - a 183% increase since January as the Pentagon backlash against OpenAI shows no sign of fading.

Claude Opus Reasoning Distilled Into Open 27B Model

A community fine-tune distills Claude Opus 4.6 chain-of-thought reasoning into Qwen3.5-27B via LoRA, racking up 4,000+ downloads in days. No benchmarks yet - but the approach raises familiar questions.

Qwen3.5-27B Distilled vs Base: What You Gain

Comparing the Claude Opus reasoning-distilled Qwen3.5-27B against the base model - what chain-of-thought distillation adds and what it costs in context, multimodal, and reliability.

Microsoft's Phi-4 Vision Matches Models 10x Its Size

Microsoft releases Phi-4-reasoning-vision-15B - a 15B open-weight multimodal model trained on 240 GPUs in 4 days that competes with 100B+ parameter models on math, science, and GUI understanding.

← Previous

Google Analytics