Google’s SpecAugment Achieves Best-In-Breed Speech Recognition Without a Language Model

By Viraj T On Apr 23, 2019

Researchers at Google are applying computer vision to images generated out of sound waves to develop Best-In-Breed Speech Recognition Without Language Models. AI researchers state that the latest SpecAugment method does not need any additional data or language models in order to recognize human speech precisely.

SpecAugment works by applying data augmentation of visual analytics to spectrograms (visual representations of speech).

“An unexpected outcome of our research was that models trained with SpecAugment out-performed all prior methods even without the aid of a language model,” Google AI resident Daniel S. Park and research scientist William Chan said in a blog post today. “While our networks still benefit from adding a language model, our results are encouraging in that it suggests the possibility of training networks that can be used for practical purposes without the aid of a language model.”

Companies Are Building AI Workforces, but Many Still Need to Deploy Governance Systems

Jul 15, 2026

Synthreo Raises $2.5 Million to Bring Agentic AI to SMB and Mid-Market Businesses Through Managed Service Providers

Jul 15, 2026

thinkbridge Launches Private Equity Services Practice to Accelerate Portfolio Value Creation

Jul 15, 2026

Prev Next 1 of 42,521

A combination of SpecAugment and LibriSpeech960h was applied for speech recognition which obtained a 2.6% word error rate. LibriSpeech960h consists of –

1,000 hours of spoken English
260 hours of telephone conversations in English

Automatic Speech recognition capabilities work by converting human speech into machine-readable text before sending out the answers. Known as conversational AI, the technology is used in a wide range of products such as Amazon’s Alexa. Google says that super conversational AI capabilities will only help in the adoption of the technology and the products associated with it.

Already advancing computing capabilities have drastically lowered errors in speech recognition. Isolating background noise improves Alexa’s speech recognition capabilities by 15%.

We recently covered the semi-supervised training method for Alexa which will improve voice recognition capabilities by 20%.