AiThority Interview Series With Peter Cahill, CEO, Voysis
Peter is the founder and CEO of Voysis. He has over 15 years’ experience in speech technology and neural network R&D. Peter is an active member of the speech research community where he chairs SynSIG. Prior to Voysis, Peter was part of a group of scientists that attracted a total of $117M funding for ADAPT, formerly CNGL. He is a graduate and former faculty member of University College Dublin.
At Voysis, we believe voice will soon be the first point of contact between ‘man’ and machine. We believe that voice driven natural language interfaces will change the way people interact with consumer and enterprise facing applications by creating more intuitive, efficient, and personalized experiences. We believe Voysis is the complete voice AI platform that will play a key role in bringing about this change.
Voice AI, when deployed in the right way, empowers consumers to do things they couldn’t do before. For example, consider natural language search vs keywords.
Tell us about your journey into Artificial Intelligence? What made you launch a voice AI platform?
It has always been inevitable that voice would become a standard human-computer interface at some point. In the past few years, we’ve seen the emergence of the first widely used consumer-facing voice AIs (i.e. Alexa, Siri, Home), yet the only companies that owned a competing tech stack for such interfaces were the top 4 platform companies. In fact, outside of the top platform companies, there’s really not many examples of consumer-facing voice interfaces. That’s where I saw an opportunity.
I started working in text-to-speech technologies back in 2003, and have been working full time in voice since then.
Today, Voysis is the only independent provider that owns the full stack of technologies, enabling any business to add voice interfaces natively into their products.
How do you make AI deliver economic benefits as well as social goodwill?
I believe Voice AI will make the Internet and technologies accessible to billions of people, while also removing friction from many of the graphical user interfaces we use today. A significant amount of the planet’s population cannot access the Internet today for various reasons, including literacy, the languages they speak, and technical knowledge.
Voice interfaces and language technologies in general will play a huge role in making the Internet accessible. Not just is literacy no longer required, but content can be translated on the fly, and no technical know-how is required to ask a question via voice. The impact this will have won’t just complement the needs of developing countries, but will include young children and the elderly, globally.
How do deep learning algorithms, speech recognition and natural language processing technologies converge at Voysis?
It’s really important for these technologies to be tightly coupled. All too often we see companies trying to build products around generic ‘one-size-fits-all’ black box components (ASR, TTS, NLP, etc), but then they learn that such an approach has an extremely hard ceiling in terms of quality, and they end up never getting their product shipped as a result.
Voysis has developed a unified deep learning platform where all these technologies exist in unison, enabling this tight coupling which is critical for a quality product.
What are the foundational tenets of owning customized voice AI?
User experience is what’s is all really about, not technology. Voice AI, when deployed in the right way, empowers consumers to do things they couldn’t do before. For example, consider natural language search vs keywords.
For decades we’ve been conditioned on keyword-based Internet search, while the reality is that I don’t want a billion results for a search query, I want one page of meaningful results at most. Natural language is far more descriptive than keywords, and as a result, can deliver better search results.
Second to user experience, we find that people who are new to this space, tend to often think of spoken language and written language as the same thing. In practice, spoken language has more dimensions to it – sometimes even a pause or a hesitation can mean a lot! As a result, you can learn a tremendous amount about people and the context in which they consume content.
Tell us about your AI research programs and the most outstanding digital campaign at Voysis/ or elsewhere?
Voysis is very active in R&D for the complete stack of voice and natural language technologies, these include speech recognition, text to speech, natural language understanding and dialogue management. Most of note is perhaps our work on Wavenet that was covered by Forbes in 2017.
What are the major challenges for AI technology companies in making it more accessible to local communities? How do you overcome these challenges?
I think interacting with AI is still quite a new concept to most of us. The ‘first-gen’ voice systems that we know well (Siri, Alexa), are still quite canned. We tend to think of them as ‘instructional’ rather than ‘intelligent’. Yet, when we focus our efforts on a particular domain (e.g. music or eCommerce), modern technologies are good enough to come across as intelligent – as enough of the context is set.
What AI start-ups and labs are you keenly following?
I’d probably pay more attention to what specific individuals are publishing more than any labs in general. The top two labs right now are DeepMind and the Google Brian team, but they work on such a broad range of problems, it’s hard to stay on top of it all!
What technologies within AI and computing are you interested in?
All aspects of machine learning are interesting. Deep learning based techniques are the clear leader for many problems these days, but that can always change so it’s good to keep up to date on as many approaches as possible.
At the moment, Wavenet type models are in their infancy, and I think we’ll see them impact speech recognition and other problems in the near future. Likewise, reinforcement learning for dialogue management is very interesting. It feels as though it’s early days for it, where the biggest challenge is the extra dimension that dialogue adds on top of voice. I think many consumers would like a voice interface that learns from their interactions with it, and I suspect it will be the stepping stone to enable that!
As an AI leader, what industries you think would be fastest to adopting AI/ML with smooth efficiency? What are the new emerging markets for AI technology markets?
The industries that need it most are where humans are involved and as a result, they’ll be the slowest to adopt. Examples include automotive and healthcare. Companies that want AI to make an existing process more efficient will be the first to adopt it. Many organizations have rules-based solutions to various problems and modern AI techniques can easily slot in with minimal disruption.
User interfaces are ideal for AI as if they get something wrong, it’s not as catastrophic as a car crashing or a robot surgeon misfiring. If we think of how we interact with the Internet, the user interface is what empowers us, but it is also what constrains us in terms of what we can do. I think AI will help us see the Internet through a different lens.
Thank you Peter! That was fun and hope to see you back on AIthority soon.