ElevenLabs Scribe v2 leads speech-to-text at 2.3% WER while ElevenLabs Flash v2.5 sets the pace for TTS with 75ms latency - but Google and Mistral are closing in fast.
IBM's new 1B-parameter speech model claims the top spot on the Open ASR Leaderboard while running on consumer hardware, beating Whisper Large V3 by 25% on word error rate.