Mind The (Accent) Gap: DefinedCrowd Contributing To More Inclusive Speech Technology

By AIT News Desk On Jul 30, 2021

In a drive to address bias in speech technology, DefinedCrowd is offering AI developers free speech datasets to enable them to test how well their speech recognition models understand nonnative English speakers with a variety of accents.

DefinedCrowd, the one-stop-shop for high-quality artificial intelligence training data, released the first of a series of free Spanish-accented English speech datasets to allow AI developers to test how well their speech recognition models understand nonnative English speakers, a demographic represented by over 35 million people in the United States.

SysAdmin Appreciation Day: Top Industry Leaders Share their Insights on IT and Data Ops

“There is an accent gap in speech technology. Research shows that speech recognition technologies are not nearly as accurate in understanding nonnative accents as they are in understanding white, non-immigrant, upper-middle-class Americans,” said Dr. Daniela Braga, founder and CEO of DefinedCrowd.

It is not a surprising phenomenon; it is this demographic that had access to and trained the technology from the beginning. To address the bias present in speech recognition technology, DefinedCrowd has released the first of four sets of Spanish-accented English speech datasets, which developers can use to test or benchmark their models to identify bias and areas which need more training data.

“Unfortunately, it has resulted in models that are more useful to some people than to others. And that must change,” said Dr. Braga.

Armada to Deliver Sovereign AI at the Edge with Microsoft Azure Local

Apr 2, 2026

Mattoboard Introduces Design Stream: The First AI Concept and Curation Engine for Interior Projects Using Real Materials

Apr 2, 2026

Cyera Achieves FedRAMP High “In Process” Designation to Securely Accelerate AI Adoption

Apr 2, 2026

Prev Next 1 of 42,063

However, many companies do not have the resources to train or test their systems with different accents, meaning that speech recognition systems are likely to provide an unresponsive, inaccurate, and even isolating experience to nonnative English speakers.

This is clearly bad for business: according to the U.S. Census, over 35 million people in the United States are native speakers of a language other than English. Sixty percent of these people speak Spanish at home.

Recommended AI News: Dynatrace Achieves AWS Government Competency

“For companies with AI solutions to compete in the large nonnative English-speaking market in the U.S., speech models need to be able to understand a wide range of different Spanish accents, originating from all the Americas,” said Christopher Shulby, Director of Machine Learning Engineering at DefinedCrowd.

The first dataset, released in two phases, includes Spanish-accented English data from the Americas, including Argentina, Brazil, Canada, Chile, Colombia, Dominican Republic, Guatemala, Honduras, Mexico, Nicaragua, Panama, Peru, the United States, Uruguay and Venezuela.

Subsequent releases will include datasets from native Spanish speakers from around the world, including Australia, China, Finland, France, Germany, India, Israel, Italy, Norway, Portugal, Russia, Spain, Sweden, and the United Kingdom.

The datasets represent speakers aged from 18 – 40, with an equal distribution of male and female speakers.

Mind The (Accent) Gap: DefinedCrowd Contributing To More Inclusive Speech Technology

In a drive to address bias in speech technology, DefinedCrowd is offering AI developers free speech datasets to enable them to test how well their speech recognition models understand nonnative English speakers with a variety of accents.

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

Mind The (Accent) Gap: DefinedCrowd Contributing To More Inclusive Speech Technology

In a drive to address bias in speech technology, DefinedCrowd is offering AI developers free speech datasets to enable them to test how well their speech recognition models understand nonnative English speakers with a variety of accents.

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

In a drive to address bias in speech technology, DefinedCrowd is offering AI developers free speech datasets to enable them to test how well their speech recognition models understand nonnative English speakers with a variety of accents.

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy