Awesome Agents

Google Is Using AI to Replace News Headlines in Search

NVIDIA DLSS 5 Uses AI to Add Real Lighting to Games

Meta Stock Surges as It Plans to Cut 16,000 Jobs for AI

Naval Ravikant: AI Eats Software Amid $1T SaaS Crash

USCC: China's Open-Source AI Now Runs 80% of US Startups

A new USCC report finds Chinese open-source models now dominate US AI startup stacks, with Qwen surpassing Llama in global downloads and Chinese models taking 41% of all Hugging Face downloads.

Hyperagents, Milestone Rewards, and the 19x Efficiency Win

Three arXiv papers push AI agents further: metacognitive self-modification, milestone-based RL lifting Gemma3-12B from 6% to 43% on WebArena-Lite, and hybrid workflows cutting inference costs 19x.

Tao: Ideas Are Now Free - Math's Bottleneck Has Moved

Terence Tao argues AI has cut the cost of mathematical idea generation to near zero, but verification remains as hard as ever - and our existing academic infrastructure wasn't built for what comes next.

OpenAI Seeks 50 GW Fusion Deal - Altman Steps Aside

OpenAI is in advanced talks to buy up to 50 gigawatts of fusion energy from Helion, the startup where Sam Altman holds a personal stake worth an estimated $375 million.

Anthropic Puts $100M Behind Claude Certification Program

Anthropic launched the Claude Certified Architect exam and invested $100M in its Partner Network - free for the first 5,000 partner employees, $99 after. Accenture is training 30,000 people.

CEO Asked ChatGPT How to Dodge $250M Bonus - Lost in Court

Krafton's CEO used ChatGPT to design 'Project X' - a corporate takeover strategy to avoid paying Subnautica creators a $250M bonus. A Delaware judge reversed everything and called it out by name.

Inside Amazon's Trainium Lab - How It Beat NVIDIA

An exclusive TechCrunch tour of Amazon's Trainium chip lab reveals how AWS is training Claude for Anthropic and now holds an $138B commitment from OpenAI.

Nemotron-Cascade 2: 30B Open MoE, One GPU, Beats 120B

NVIDIA's new Nemotron-Cascade-2-30B-A3B activates just 3B parameters per token, runs on a single RTX 4090, and outscores NVIDIA's own 120B model on coding and math benchmarks.

Leanstral Outperforms Claude Sonnet at Formal Code Proofs

Mistral's new open-source Lean 4 agent scores higher than Claude Sonnet on formal proofs at one-fifteenth the cost, raising the bar for trustworthy AI code generation.

View All News →

Guides View All →

AI Memory Explained - What Your AI Knows About You

A plain-English guide to how ChatGPT, Claude, and Gemini remember you - what gets stored, how to manage it, and what to keep private.

How to Follow Us

Every way to stay up to date with Awesome Agents - website, podcast on Spotify and Apple, social media on X, Bluesky, LinkedIn, YouTube, and RSS feeds.

How to Use AI as a Personal Tutor - Beginner's Guide

A practical, step-by-step guide to using ChatGPT, Claude, and other AI tools as a personal tutor to learn any skill faster - no coding required.

Reviews View All →

Microsoft Phi-4 Reasoning: Small Model, Big Math

Microsoft's Phi-4 reasoning family delivers near-70B-class math performance in a 14B open-weight package, but the overthinking problem is real and the use case is narrower than the benchmarks suggest.

LTX-2.3 Review: Open-Source Video AI That Delivers

LTX-2.3 is a 22-billion-parameter open-source video and audio generation model from Lightricks that rivals closed commercial tools - at zero cloud cost.

Mistral Small 4 Review: One Model, Three Jobs

Mistral Small 4 packs reasoning, vision, and agentic coding into a 119B MoE under Apache 2.0 - a serious small-model contender at a price that's hard to ignore.

Leaderboards View All →

Computer Use Leaderboard: Desktop AI Agent Rankings

Rankings of the best AI models and agent frameworks on computer use benchmarks - OSWorld, OSWorld-Verified, and ScreenSpot-Pro - updated March 2026.

AI Safety Leaderboard: Refusal and Jailbreak Rankings

Rankings of AI models by safety metrics including refusal rates, jailbreak resistance, bias scores, and truthfulness across major benchmarks.

AI Voice and Speech Leaderboard: TTS and STT Rankings

Rankings of the best text-to-speech and speech-to-text AI models by naturalness, accuracy, latency, and pricing.

Models View All →

Claude Sonnet 4.6: Mid-Tier Model, Flagship Results

Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Cohere Command A Vision: 112B Multimodal Model

Cohere Command A Vision is a 112B multimodal model that leads on document and OCR benchmarks, beating GPT-4.1 across seven visual understanding tasks.

Mistral Small 4

Mistral AI's unified MoE model - 119B total parameters, 6B active per token, 128 experts, 256K context, configurable reasoning, Apache 2.0 license.

Recent

Apple Asks Google to Run Siri After Cloud Bet Fails

Apple has asked Google to host next-gen Siri on its cloud servers after Private Cloud Compute hit just 10 percent utilization, with some servers still sitting in warehouses.

Gemini 3.1 Pro Tops Benchmarks but Developers Can't Rely on It

Gemini 3.1 Pro leads ARC-AGI-2, LiveCodeBench, and 11 other benchmarks with 750 million users and 21.5% market share - but developers report stalled responses, leaked thinking tokens, and API outages that make it unusable for production coding and agent workflows.

Google Launches Gemini 3.1 Flash-Lite as Gemini 3 Pro Dies

Google Analytics