Llm

Nemotron-Cascade 2: 30B Open MoE, One GPU, Beats 120B

NVIDIA's new Nemotron-Cascade-2-30B-A3B activates just 3B parameters per token, runs on a single RTX 4090, and outscores NVIDIA's own 120B model on coding and math benchmarks.

Interpretability Limits, Dark Models, Persona Traps

Three new papers expose a gap between what AI models know and what they do - and why that gap is harder to close than anyone assumed.

Claude Sonnet 4.6: Mid-Tier Model, Flagship Results

Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Gemini 3.1 Flash-Lite Review: Fast, Cheap, and Capable

Google's Gemini 3.1 Flash-Lite delivers frontier-class benchmarks at a fraction of the cost of Pro - but a sluggish first-token response and preview-only status mean it's not for every workload.

Hundreds of LLM-Written GitHub Repos Are Malware

We ran the GitHub search query from a researcher's blog post and confirmed 300+ malicious repositories with AI-generated READMEs distributing info-stealers - with the real number likely north of 1,000.

Best AI Models for RAG - March 2026

Gemini 2.5 Flash leads RAG generation accuracy at 87% on LIT-RAGBench, while o3 tops multi-hop reasoning and Qwen3-235B is the best open-source option.

Llm