
GLM-4.7-Flash
Zhipu's GLM-4.7-Flash is a 30B-A3B MoE model that posts 59.2% on SWE-bench Verified and 79.5% on tau2-Bench while running on a single RTX 4090 - MIT licensed and free via the Z.AI API.

Zhipu's GLM-4.7-Flash is a 30B-A3B MoE model that posts 59.2% on SWE-bench Verified and 79.5% on tau2-Bench while running on a single RTX 4090 - MIT licensed and free via the Z.AI API.

Meta's Llama 4 Maverick packs 400B total parameters into a 128-expert MoE architecture with only 17B active per token, beating GPT-4o on Chatbot Arena while matching DeepSeek V3 on reasoning at half the active parameters.

Meta's Llama 4 Scout is a 109B-total, 17B-active MoE model with 16 experts and a 10M-token context window - the longest of any open-weight model - with native multimodal support for text and images.

Microsoft's 14B dense transformer that consistently beats models 5x its size on MATH and GPQA, available under the MIT license for unrestricted commercial use.

Mistral Large 3 is a 675B-parameter MoE model activating 41B per token with native multimodal support, a 256K context window, and Apache 2.0 licensing - Europe's first frontier-class open-weight model.

Mistral Small 3.2 is a 24B dense model with strong function calling, multimodal vision, and 128K context under Apache 2.0 - optimized for production tool-use pipelines and EU-compliant deployments.