Awesome Agents

Shield AI Raises $2B at $12.7B in Defense AI Bet

Google Is Using AI to Replace News Headlines in Search

Cursor's Composer 2 Is Kimi K2.5 With RL - And No Attribution

NVIDIA Open-Sources the Sandbox AI Agents Should Have Had

Shield AI Raises $2B at $12.7B in Defense AI Bet

Shield AI closed a $2B raise at a $12.7B valuation, more than doubling from $5.3B a year ago, to fund its Hivemind autonomous pilot software and acquire Pentagon simulation vendor Aechelon Technology.

GitHub Copilot Will Train on Your Code This April

Starting April 24, GitHub will use Copilot Free and Pro users' interaction data to train AI models by default - with opt-out buried in settings.

Tencent's 7B Voice AI Targets OpenAI's Realtime API

Tencent open-sources Covo-Audio, a 7B end-to-end audio language model with native full-duplex conversation that beats larger closed models on key benchmarks.

Anthropic Adds Auto Mode to Claude Code with Safety Gates

Anthropic's new Auto Mode for Claude Code uses a two-layer classifier to automatically approve or block risky commands, offering a middle path between manual approvals and full autonomy.

ARC-AGI-3 Launches - AI Agents Must Learn, Not Memorize

ARC Prize Foundation launched ARC-AGI-3 today with a fully open-source agent toolkit. The best AI in the preview phase scored 12.58% against a human baseline of 100%.

Apple Can Distill Google Gemini for On-Device Siri

New details reveal Apple has full data center access to Gemini and can create smaller on-device derivative models - far more control than the original deal disclosed.

New York's RAISE Act Is Law - AI Labs Have Until 2027

New York's RAISE Act is now on the books, requiring frontier AI developers to publish safety protocols, report incidents within 72 hours, and submit to annual audits by January 2027.

Kleiner Perkins Goes All-In on AI With $3.5B Raise

Kleiner Perkins closes a $3.5B dual fund - its largest raise in the current era - betting on Anthropic, Harvey, and a 2026 IPO window.

LiteLLM Was Hacked Through Its Own Vulnerability Scanner

The LiteLLM supply chain attack originated from Trivy - the security scanner in LiteLLM's CI/CD pipeline. TeamPCP compromised Trivy, stole the PyPI publishing token, and uploaded backdoored packages directly.

View All News →

Guides View All →

Which AI Model Should I Use? A Decision Guide

A practical guide to picking the right AI model for your needs, comparing ChatGPT, Claude, Gemini, Perplexity, and Copilot across writing, coding, research, and more.

Multimodal AI Explained - A Beginner's Guide

Multimodal AI can see, hear, and read at once - here's how it works and why it matters for everyday users.

How to Use AI for Academic Research and Writing

A practical guide to AI research tools that help you find papers, summarize findings, and write better academic work in less time.

Reviews View All →

Kimi K2.5 Review: Open Weights, Agent Swarms, Caveats

Moonshot AI's Kimi K2.5 delivers best-in-class open-weight math and a genuinely novel multi-agent architecture, but a brutal hallucination rate and slow inference limit its real-world reliability.

Microsoft Phi-4 Reasoning: Small Model, Big Math

Microsoft's Phi-4 reasoning family delivers near-70B-class math performance in a 14B open-weight package, but the overthinking problem is real and the use case is narrower than the benchmarks suggest.

LTX-2.3 Review: Open-Source Video AI That Delivers

LTX-2.3 is a 22-billion-parameter open-source video and audio generation model from Lightricks that rivals closed commercial tools - at zero cloud cost.

Leaderboards View All →

Computer Use Leaderboard: Desktop AI Agent Rankings

Rankings of the best AI models and agent frameworks on computer use benchmarks - OSWorld, OSWorld-Verified, and ScreenSpot-Pro - updated March 2026.

AI Safety Leaderboard: Refusal and Jailbreak Rankings

Rankings of AI models by safety metrics including refusal rates, jailbreak resistance, bias scores, and truthfulness across major benchmarks.

AI Voice and Speech Leaderboard: TTS and STT Rankings

Rankings of the best text-to-speech and speech-to-text AI models by naturalness, accuracy, latency, and pricing.

Models View All →

Xiaomi MiMo-V2-Pro - Agentic 1T MoE Model

Xiaomi's MiMo-V2-Pro is a 1-trillion-parameter MoE model with 42B active params, 1M context, and agentic coding performance that rivals Claude Sonnet 4.6 at a fraction of the cost.

Claude Sonnet 4.6: Mid-Tier Model, Flagship Results

Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Cohere Command A Vision: 112B Multimodal Model

Cohere Command A Vision is a 112B multimodal model that leads on document and OCR benchmarks, beating GPT-4.1 across seven visual understanding tasks.

Recent

Open-Source LLM Leaderboard: February 2026

Rankings of the best open-weight and open-source large language models in February 2026, including DeepSeek V3.2, Qwen 3.5, Llama 4 Maverick, GLM-5, and Mistral 3.

Gemini 3 Pro Review: Google's Vision AI Powerhouse

A detailed review of Google's Gemini 3 Pro, a natively multimodal AI model that leads in vision, spatial reasoning, and video understanding.

ChatGPT vs Claude vs Gemini: Which AI Assistant Should You Use?

Head-to-head comparison of ChatGPT, Claude, and Gemini in 2026. Pricing, strengths, weaknesses, and best use cases for each AI assistant.

What Are AI Agents? A Plain-English Explanation

A beginner-friendly explanation of AI agents, covering what makes them different from chatbots, real-world examples, key frameworks, and the growing agent economy.

MMLU-Pro Leaderboard: Graduate-Level Knowledge Rankings

Complete MMLU-Pro benchmark rankings measuring graduate-level knowledge across 14 subjects with 12,000 questions and 10 answer options per question.

GLM-5 Arrives: 744B-Parameter Open-Source Model Built for Agents

Z.ai releases GLM-5, a 744B parameter open-source Mixture-of-Experts model purpose-built for agentic tasks, scoring 77.8% on SWE-bench Verified and 56.2% on Terminal-Bench 2.0.

Cursor Review: The AI-Native IDE That Changed How We Code

A thorough review of Cursor, the VS Code fork that has become the gold standard for AI-assisted coding with Composer mode, full project understanding, and multi-file edits.

Best Tools for Running LLMs Locally in 2026

Compare the best tools for running large language models locally: Ollama, LM Studio, llama.cpp, GPT4All, and LocalAI. Includes hardware requirements and model recommendations.

OpenAI Starts Showing Ads to Free ChatGPT Users

OpenAI begins testing advertisements in ChatGPT for Free and Go tier users in the US, while Plus, Pro, Business, Enterprise, and Education plans remain ad-free.

DeepSeek V3.2 Review: GPT-5 Performance at a Fraction of the Cost

A thorough review of DeepSeek V3.2, the 671B parameter MoE model that delivers frontier-level performance at dramatically lower cost with an MIT license.

How to Run an Open-Source LLM on Your Own Computer

A practical tutorial on running open-source language models locally using Ollama, llama.cpp, and LM Studio, with hardware requirements and model recommendations.

Claude Code Review: Terminal-First AI Coding at Its Finest

A hands-on review of Anthropic's Claude Code CLI, a terminal-first AI coding assistant that excels at large refactors, architecture work, and complex multi-file projects.

← Previous

Google Analytics