Inference

Etched Sohu - Transformer-Only Inference ASIC

Etched Sohu - Transformer-Only Inference ASIC

Full specs and critical analysis of the Etched Sohu - a transformer-specific ASIC claiming 500K+ tokens/sec on Llama 70B, built on TSMC 4nm with 144GB HBM3E. Bold claims, but no independent benchmarks yet.

Hailo-10H - Edge AI With On-Device LLMs

Hailo-10H - Edge AI With On-Device LLMs

Complete specs, benchmarks, and analysis of the Hailo-10H - a 2.5W edge AI accelerator with 40 TOPS INT4, on-module LPDDR4, and the ability to run LLMs and VLMs on a Raspberry Pi at 10 tokens per second.

NVIDIA Rubin CPX - Inference GPU With GDDR7

NVIDIA Rubin CPX - Inference GPU With GDDR7

Full specs, benchmarks, and analysis of the NVIDIA Rubin CPX - a purpose-built inference GPU with 128GB GDDR7, 30 PFLOPS NVFP4, and 3x faster attention versus Blackwell, targeting million-token context workloads.