
NVIDIA's Secret Chip Fuses GPU and Groq for OpenAI
NVIDIA will unveil a new inference processor built on Groq's LPU architecture at GTC 2026, with OpenAI as its first major customer allocating 3 GW of dedicated capacity.

NVIDIA will unveil a new inference processor built on Groq's LPU architecture at GTC 2026, with OpenAI as its first major customer allocating 3 GW of dedicated capacity.

Groq's Language Processing Unit (LPU) is a purpose-built inference ASIC that trades HBM for 230MB of on-chip SRAM, delivering deterministic latency and record-breaking tokens-per-second for LLM serving.

Groq's LPU chips deliver inference speeds that make GPUs look slow - 1,200+ tokens per second on Llama 4. We benchmark latency, throughput, model availability, and pricing against the GPU-based competition.