Open source

Nemotron 3 Nano 30B-A3B

NVIDIA's hybrid Mamba2+MoE model packs 31.6B total parameters but activates only 3.2B per token, delivering frontier-class reasoning with 3.3x the throughput of comparable models on a single H200 GPU.

Qwen3.5-27B vs Gemma 3 27B: Same Parameter Count, Completely Different Models

A data-driven comparison of Alibaba's Qwen3.5-27B and Google's Gemma 3 27B - two 27B dense models that share a parameter count and almost nothing else.

Qwen3.5-27B vs Mistral Small 3.2: Apache 2.0 Heavyweights Go Head to Head

A data-driven comparison of Alibaba's Qwen3.5-27B and Mistral's Small 3.2 - two Apache 2.0 dense models in the 24-27B range with very different benchmark profiles and deployment strengths.

Qwen3.5-27B vs Phi-4: When Twice the Parameters Is Not Twice as Obvious

A data-driven comparison of Alibaba's Qwen3.5-27B and Microsoft's Phi-4 - a 27B hybrid architecture versus a 14B STEM specialist, testing whether raw parameter count or training efficiency wins in practice.

Qwen3.5-35B-A3B vs GLM-4.7-Flash: Two Chinese MoE Models, Very Different Strengths

Head-to-head comparison of Qwen3.5-35B-A3B and GLM-4.7-Flash - two Chinese-origin 30B-A3B MoE models with Apache 2.0/MIT licenses that dominate different benchmarks despite near-identical parameter budgets.

Qwen3.5-35B-A3B vs Llama 4 Scout: 3B Active Parameters vs 17B - Does 5.7x More Compute Actually Win?

David vs Goliath: Qwen3.5-35B-A3B activates 3B parameters and beats Llama 4 Scout's 17B active on MMLU-Pro, GPQA, and coding benchmarks - but Scout's 10M context window and native multimodal support tell a different story.

← Previous