
Qwen3.5-4B
Qwen3.5-4B is a 4B dense multimodal model that matches Qwen3-30B on MMLU-Pro and beats GPT-5-Nano on vision benchmarks. Runs on 8GB VRAM, Apache 2.0 licensed, 262K-1M context.

Qwen3.5-4B is a 4B dense multimodal model that matches Qwen3-30B on MMLU-Pro and beats GPT-5-Nano on vision benchmarks. Runs on 8GB VRAM, Apache 2.0 licensed, 262K-1M context.

Qwen3.5-9B is a 9B dense model that outperforms Qwen3-30B on most benchmarks and beats GPT-5-Nano on vision tasks. Natively multimodal with 262K-1M context, Apache 2.0 licensed.

Alibaba releases official FP8-quantized weights for the Qwen 3.5 flagship and 27B dense model, cutting memory requirements roughly in half and enabling deployment on 8x H100 GPUs with native vLLM and SGLang support.

Comparing Kimi K2.5 and Qwen3.5 Flash - Moonshot AI's trillion-parameter frontier model against Alibaba's cheapest and fastest API offering.

Comparing Kimi K2.5's 1T-parameter benchmark dominance against Qwen3.5-122B-A10B's extraordinary parameter efficiency - and why the smaller model is harder to dismiss than the numbers suggest.

Comparing Kimi K2.5's trillion-parameter benchmark dominance against Qwen3.5-27B's single-GPU accessibility - two models from entirely different tiers that both have compelling use cases.