Lpu

NVIDIA's Secret Chip Fuses GPU and Groq for OpenAI

NVIDIA will unveil a new inference processor built on Groq's LPU architecture at GTC 2026, with OpenAI as its first major customer allocating 3 GW of dedicated capacity.

Groq LPU - Deterministic Inference at Scale

Groq's Language Processing Unit (LPU) is a purpose-built inference ASIC that trades HBM for 230MB of on-chip SRAM, delivering deterministic latency and record-breaking tokens-per-second for LLM serving.

Groq Review: The Fastest Inference Engine Money Can Buy

Groq's LPU chips deliver inference speeds that make GPUs look slow - 1,200+ tokens per second on Llama 4. We benchmark latency, throughput, model availability, and pricing against the GPU-based competition.

Lpu

NVIDIA's Secret Chip Fuses GPU and Groq for OpenAI

Groq LPU - Deterministic Inference at Scale

Groq Review: The Fastest Inference Engine Money Can Buy

Google Analytics