By AI Trends staff
Advances in AI behind speech recognition are driving growth in the market, attracting venture capital and funding for startups, challenging established players.
The growing adoption and use of speech recognition devices is driving the market, which is expected to reach $26.8 billion globally by 2025, according to estimates by Meticulous Research. Analytics Insight:. Better speed and accuracy are some of the benefits of the emerging technology.
One company on the cusp of this new growth, AssemblyAI in San Francisco, offers an API for speech recognition that can decode videos, podcasts, phone calls and remote meetings. The company was founded by CEO Dylan Fox in 2017 and has received backing from Y Combinator, a startup accelerator, as well as NVIDIA.
Fox has an unusual background for a high-tech entrepreneur. He is a graduate of George Washington University with degrees in business administration, business economics and public policy. He landed a job as a machine learning software engineer at Cisco’s startup product lab in San Francisco, working on deep neural networks and machine learning. He got the idea for AssemblyAi and raised capital from Y Combinator, which enabled him to hire data scientists and data engineers to get the technology off the ground.
asked in the interview AI trends how he made this transition from a business administration and economics undergraduate to a high-tech entrepreneur, Fox said. I was looking for a more challenging programming challenge, which led to natural language processing, which led me to Cisco.” They were working on Siri for Enterprise at the time for Apple.
To speed things up, Cisco was looking to acquire speech recognition software; Fox was in the cat seat to search. “We looked at Nuance,” for example, which is recognized as the market leader and owns more speech recognition software than its competitors. (Microsoft’s $19.6 billion acquisition of Nuance is expected to close by the end of the year.) The young, budding entrepreneur was unimpressed. “It was crazy how bad all the options were in terms of accuracy and developer,” he said.
He was impressed in 2008 by San Francisco-based Twilio, which that year released the Twilio Voice API for making and receiving phone calls hosted in the cloud. The company has since raised $103 million in venture capital. “They were setting new standards for a good API for developers,” Fox said.
Fox’s idea was to use artificial intelligence and machine learning to achieve “super-accurate results” and make it easy for developers to incorporate the API into their products. One customer is CallRail, which offers call tracking and marketing analytics software that plans to incorporate AssembyAI’s API to understand why people call. Other customers include NBC and the Wall Street Journal, which use the product to transcribe content and interviews and provide closed captioning.
“We worked on creating as close as possible to the quality of human speech recognition. It’s been a lot of work,” Fox said. He expects to reach that plateau in 2022.
He targets companies that include speech recognition in their products and make it easy to buy. Customers pay on a usage basis; for every second recorded, AssemblyAI charges a fraction of a penny. Customers receive monthly bills. If a customer uses 10 hours a month, it costs about nine dollars. If a customer uses a million hours a month, that’s about $900,000.
Voice recognition is a hot market. “There are a lot of new startups opening up,” Fox said of the opportunity. “There are a lot of exciting new businesses being built on voice data.”
AssemblyAI’s product can detect sensitive topics such as hate speech and profanity, so customers can save on human content moderation.
Asked to describe what makes his technology different, Fox said: “We are an experienced team of deep learning researchers” with experience at BMW, Apple and Facebook. “We’re building very large, very accurate deep learning models that have recognition results much more accurate than traditional machine learning approaches. We build really big models using advanced neural network technologies.” He compared the approach to what OpenAI uses to develop its GPT-3 large language model.
In addition, they build AI features on top of transcripts to provide summaries of audio and video content that can be searched and indexed. “It goes beyond just transcription,” Fox said.
The company now has 25 employees and expects to double in about four months. Business has been good. “There’s an explosion of online audio and video data, and customers want to take advantage of that, so we’re seeing a lot of demand,” Fox said.
Learn more here AssemblyAI.