Artificial Intelligence | News | Insights | AiThority
[bsfp-cryptocurrency style=”widget-18″ align=”marquee” columns=”6″ coins=”selected” coins-count=”6″ coins-selected=”BTC,ETH,XRP,LTC,EOS,ADA,XLM,NEO,LTC,EOS,XEM,DASH,USDT,BNB,QTUM,XVG,ONT,ZEC,STEEM” currency=”USD” title=”Cryptocurrency Widget” show_title=”0″ icon=”” scheme=”light” bs-show-desktop=”1″ bs-show-tablet=”1″ bs-show-phone=”1″ custom-css-class=”” custom-id=”” css=”.vc_custom_1523079266073{margin-bottom: 0px !important;padding-top: 0px !important;padding-bottom: 0px !important;}”]

Alexa Reduces Speech Recognition Errors by Leveraging Semi-Supervised Learning

Amazon’s Alexa group of scientists announced on April 4, 2019, that they have used a very large unbundled data set to expose Alexa to a variety of human sounds. The data used is perhaps one of the largest in history used to train an acoustic model, the scientists say. The aim is to help the intelligent assistant better understand human voices.

Semi-Supervised Learning, a technology through which this is being achieved, is a combination of sounds that have been tagged by human beings as well as machines to train Artificial Intelligence engines such as Amazon’s Alexa. The results were a reduction in speech recognition errors by 10-22%. Scientists say that this method works better than Supervised Learning — the technology consisting of sounds tagged by machines only.

Read More: PayPal’s First Blockchain Investment Is an Identity Ownership-Driven Start-Up

Discussing the development, Alexa Senior Applied Scientist, Hari Parthasarathi, stated in a blog post, “We are currently working to integrate the new model into Alexa, with a projected release date of later this year. The 7,000 hours of annotated data are more accurate than the machine-labeled data, so while training the student, we interleave the two. Our intuition was that if the machine-labeled data began to steer the model in the wrong direction, the annotated data could provide a course correction.”

The acoustic model was trained with 7,000 hours of tagged sound data, instead of the supervised learning method wherein untagged sounds up to 1 million hours can be used for training. Acoustic models are responsible for automatic speech recognition that converts human voices into voice commands.

Related Posts
1 of 40,365

Read More: Hyperledger Welcomes Nine New Members to Its Expanding Enterprise Blockchain Community

This major development in Alexa was also achieved by another method commonly known as long-short-term memory (LSTM), colloquially known as the ‘teacher-student’ method. Here, the teacher is already trained in understanding 30-milliseconds of audio, some of it, the teacher transfers to the student.

A number of other techniques were used as well, such as:

  • Singular, instead of the dual pattern of student model analysis
  • Interleaving or mixing the two models
  • Storing the 20 highest teacher model outputs, compared to the traditional way of storing results in 3,000 clusters
  • Making the student model capable of learning from the maximum of these 20 teacher models

Recently, Amazon announced a reduction in speech recognition errors by 20% as well as a change in the design of the Echo device to reduce the microphone number from seven to two for better speech recognition capabilities.

Read More: AiThority Interview Series with Jeff Epstein, VP of Product at Comm100

2 Comments
  1. Recycled copper raw materials says

    Copper scrap cleanliness standards Copper scrap market trends analysis Metal reclaiming and utilization center
    Copper cable scrap repurposing, Metal reclaiming depot, Copper scrap classification

  2. Sustainable scrap metal operations Ferrous material recycling regulatory adherence Iron scrap reclamation

    Ferrous metal scrap yard, Scrap iron reclaiming yard, Regulatory compliance for metal recycling

Leave A Reply

Your email address will not be published.