Alexa Reduces Speech Recognition Errors by Leveraging Semi-Supervised Learning

By Viraj T On Apr 5, 2019

Amazon’s Alexa group of scientists announced on April 4, 2019, that they have used a very large unbundled data set to expose Alexa to a variety of human sounds. The data used is perhaps one of the largest in history used to train an acoustic model, the scientists say. The aim is to help the intelligent assistant better understand human voices.

Semi-Supervised Learning, a technology through which this is being achieved, is a combination of sounds that have been tagged by human beings as well as machines to train Artificial Intelligence engines such as Amazon’s Alexa. The results were a reduction in speech recognition errors by 10-22%. Scientists say that this method works better than Supervised Learning — the technology consisting of sounds tagged by machines only.

Discussing the development, Alexa Senior Applied Scientist, Hari Parthasarathi, stated in a blog post, “We are currently working to integrate the new model into Alexa, with a projected release date of later this year. The 7,000 hours of annotated data are more accurate than the machine-labeled data, so while training the student, we interleave the two. Our intuition was that if the machine-labeled data began to steer the model in the wrong direction, the annotated data could provide a course correction.”

The acoustic model was trained with 7,000 hours of tagged sound data, instead of the supervised learning method wherein untagged sounds up to 1 million hours can be used for training. Acoustic models are responsible for automatic speech recognition that converts human voices into voice commands.

Bureau Raises $30Million Series B as Global Fraud Losses Hit $486 Billion

Dec 19, 2024

Run:ai to Streamline the Deployment and Management of AI Infrastructure for Enterprises

Dec 19, 2024

EQTY Lab, Intel, and NVIDIA Unveil ‘Verifiable Compute,’ A Solution to Secure Trusted AI

Dec 19, 2024

Prev Next 1 of 40,452

This major development in Alexa was also achieved by another method commonly known as long-short-term memory (LSTM), colloquially known as the ‘teacher-student’ method. Here, the teacher is already trained in understanding 30-milliseconds of audio, some of it, the teacher transfers to the student.

A number of other techniques were used as well, such as:

Singular, instead of the dual pattern of student model analysis
Interleaving or mixing the two models
Storing the 20 highest teacher model outputs, compared to the traditional way of storing results in 3,000 clusters
Making the student model capable of learning from the maximum of these 20 teacher models

Recently, Amazon announced a reduction in speech recognition errors by 20% as well as a change in the design of the Echo device to reduce the microphone number from seven to two for better speech recognition capabilities.

AI Alexa amazon news speech recognition voice assistants

2 Comments

Recycled copper raw materials says 10 months ago

Copper scrap cleanliness standards Copper scrap market trends analysis Metal reclaiming and utilization center
Copper cable scrap repurposing, Metal reclaiming depot, Copper scrap classification
Iron waste reclamation and recycling says 9 months ago

Sustainable scrap metal operations Ferrous material recycling regulatory adherence Iron scrap reclamation

Ferrous metal scrap yard, Scrap iron reclaiming yard, Regulatory compliance for metal recycling