Mixture of experts

NVIDIA Nemotron 3 Super 120B-A12B

NVIDIA Nemotron 3 Super is a 120B-parameter open model with 12B active at inference, combining Mamba-2, LatentMoE, and Multi-Token Prediction for agentic workloads with a 1M token context window.

NVIDIA Ships Nemotron 3 Super - 120B Open Model for Agents

NVIDIA releases Nemotron 3 Super, a 120B-parameter open model with only 12B active at inference, combining Mamba-2 and Transformer layers for agentic AI workloads with a 1M token context window.

Liquid AI Drops LFM2-24B - A 24 Billion Parameter Model That Runs on Your Laptop

MIT spinoff Liquid AI releases LFM2-24B-A2B, a hybrid mixture-of-experts model that activates only 2.3B parameters per token, fits in 32GB RAM, and hits 112 tokens per second on a consumer CPU.

Gemini 3.1 Pro

Google DeepMind's Gemini 3.1 Pro leads on 13 of 16 benchmarks with 77.1% ARC-AGI-2, 94.3% GPQA Diamond, and a 1M-token context window at $2/M input.

Alibaba Drops Qwen3.5: 397B Parameters, 17B Active, and It Trades Blows with GPT-5.2

Alibaba's Qwen team releases Qwen3.5-397B-A17B, an open-weight mixture-of-experts model with native multimodal support and a hybrid attention architecture that runs 8x faster than its predecessor. Apache 2.0 licensed.

StepFun's Step 3.5 Flash Uses Just 11B of Its 196B Parameters and Still Rivals GPT-5.2

Shanghai AI lab StepFun open-sources Step 3.5 Flash, a 196B sparse MoE model that activates only 11B parameters per token while matching frontier models on reasoning, coding, and agentic benchmarks.