Soniox Speech AI Achieves Extreme Accuracy
Read More about AiThority Interview : AiThority Interview with Rebecca Clyde, Co-founder and CEO of Botco.ai
“Achieving human-parity or superhuman accuracy is of paramount importance, rather than settling for a solution with a misrecognition rate of 20% or higher, which proves to be useless for most applications.”
- Soniox launched new foundational AI models for speech recognition, achieving extremely high accuracy rates.
- Soniox’s new AI models often surpass human performance, delivering more accurate speech recognition and generating properly formatted text.
- Soniox’s speech recognition AI consistently outperforms OpenAI, Google, and other providers, with accuracy improvements from 24% to 78%, making it a game-changer for voice and speech applications.
- Soniox also released the Soniox mobile app and Soniox Playground, allowing you to experience the new era of voice AI firsthand.
Foundational AI breakthroughs are challenging to achieve in a startup environment due to the costs and complexity associated with processing and training large models on internet-scale data. However, Soniox did not shy away from the challenge and built a ground-up infrastructure to efficiently process and train large models on massive amounts of audio and text.
Specifically, Soniox processed over 1 million hours of audio data for training. The entire training process was completed on a single A100 server (8xA100 GPUs) in less than 4 weeks! This achievement in engineering innovation alone saved millions of dollars in processing and training costs.
AiThority Interview Insights: AIThority Interview with David Lambert, VP & GM, Strategy & Growth, APAC, Medallia
Novel AI Models:
Achieving high accuracy with low-latency constraints is one of the most challenging problems in AI today. Why? The AI model has to constantly make decisions (e.g., output words) in real time while dealing with a high level of uncertainty and missing information. This challenge is not limited to speech recognition but extends to robotics, which faces similar issues.
To effectively solve this problem, Soniox had to design new and more efficient neural network architectures and develop new criterions that inherently prioritize low-latency decision-making while still considering accuracy. Although Soniox has been training these models for the past year, the improvements were incremental until the breakthrough moment about 6 months ago.
Path Towards Human-Parity:
In the last year, there have been releases of speech recognition models from Google, Meta, and other companies that support one thousand or more languages. “What all of these approaches fail to address is accuracy. Speech recognition is all about accuracy, period,” said Klemen Simonic, Founder and CEO of Soniox. “Achieving human-parity or superhuman accuracy is of paramount importance, rather than settling for a solution with a misrecognition rate of 20% or higher, which proves to be useless for most applications.”
Soniox is introducing highly accurate models for nine languages, starting with English and Korean. For many of these languages, this will mark the first introduction of highly accurate speech recognition AI. Soniox is looking forward to collaborations with various companies worldwide, and believes this could represent a breakthrough moment for numerous voice and speech applications.
Latest AiThority Interview Insights : AiThority Interview with Ketan Karkhanis, EVP & GM, Sales Cloud at Salesforce
[To share your insights with us, please write to firstname.lastname@example.org]