Spotify has bigger plans for the technology behind its new AI DJ feature after seeing positive consumer response to the new feature. Just before the launch of last week’s Stream On event in Los Angeles, an AI DJ curated a personalized selection of music paired with spoken commentary delivered in a realistic-sounding, AI-generated voice. But the feature uses the latest in AI technology and Large Language Models, as well as generative voice, all layered on top of Spotify’s existing investments in personalization and machine learning.
These new tools aren’t necessarily limited to one feature, Spotify believes, which is why it’s now testing other uses for the technology.
While the highlight of Spotify’s Stream On event was the revamped mobile app, which now focuses on TikTok-like discovery streams for music, podcasts and audiobooks, AI DJ is now a prominent part of the streaming service’s new experience. Introduced to Spotify Premium subscribers in the US and Canada in late February, DJ is designed to get to know users well enough to play what you want to hear at the touch of a button.
With the app upgrade, DJ will appear at the top of the screen under the Music sub-stream for subscribers, serving as both a seamless way to stream favorite music and a way to push free users to upgrade.
To create commentary to accompany music on DJ streams, Spotify says it used the knowledge base and insights of its own music experts. Using OpenAI’s Generative AI technology, the DJ can then tailor their interpretations to the app’s end users. And unlike ChatGPT, which tries to generate answers by distilling information found on the wider web, Spotify’s more limited database of musical knowledge ensures that a DJ’s commentary is both relevant and accurate.
The actual music selections chosen by the DJ come from its existing understanding of the user’s tastes and interests, mirroring what would previously be programmed into personalized playlists like Discover Weekly and more.
AI DJ’s voice, meanwhile, was created using technology Spotify acquired from Sonatic last year and is based on Spotify’s head of cultural partnerships Xavier “X” Jernigan’s voice from Spotify’s now-defunct morning show The Get Up is the host of the podcast. Surprisingly, the voice sounds incredibly realistic and not at all robotic. (During a Spotify live event, Jernigan spoke alongside her AI double, and the differences were hard to tell. “I can listen to my own voice all day,” she joked.)
“The reason is that it sounds so good. in fact, that’s the goal of Sonatic Technology, the team we’ve acquired. It’s about the emotion of the sound,” Spotify’s head of personalization Ziad Sultan explained to TechCrunch after Stream On ended. “When you listen to the AI DJ, you’ll hear where there’s a pause for breath. You will hear different intonations. For some genres, you can hear excitement,” he says.
Natural-sounding AI voice is, of course, nothing new. Google wowed the world years ago with its own human-sounding AI creation. But its implementation in Duplex led to criticism because the AI collected businesses on behalf of the end user without first revealing that it wasn’t a real person. There should be no such concern with Spotify’s feature as it is even called ‘AI DJ’.
To make Spotify’s AI voice sound natural, Jernigan went into the studio to produce high-quality audio recordings while working with audio technology experts. There, he was instructed to read different lines using different emotions, which were then fed into an artificial intelligence model. Spotify isn’t saying how long this process takes or detailing the specifics, saying the technology is evolving and calling it its “secret sauce.”
“From that high-quality input that has many different conversions, [Jernigan] then it doesn’t need to say anything anymore, now it’s purely artificial intelligence,” says the sultan of generated voice. Still, Jernigan occasionally pops into Spotify’s writers’ room to give feedback on how he’ll read a line to make sure he has an ongoing contribution.
But while AI DJ is built on a combination of Sonatic and OpenAI technologies, Spotify is also investing in internal research to better understand the latest in AI and Large Language Models.
“We have a research team working on the latest language models,” Sultan told TechCrunch. In fact, it has several hundred employees working on personalization and machine learning. In the case of AI DJ, the team uses the OpenAI model, Sultan notes. “But overall, we have a great research team that understands all the possibilities of big language models, generative voice, personalization. This is fast-paced,” he says. “We want to be known for our AI expertise.”
However, Spotify may or may not use its own AI technology to enable future developments. It may decide that it makes more sense to work with a partner, as it does now with OpenAI. But it is still too early to say.
“We publish papers all the time,” says Sultan. “We will invest in the latest technologies. as you can imagine, LLMs in this field are such a technology. So we will develop the expertise.”
With this foundational technology, Spotify can advance other areas that include AI, LLMs, and generative AI technologies. What those areas might look like in terms of consumer products, the company is not yet saying. (We’ve heard that a chatbot like ChatGPT is among the options being tested, though.
“We haven’t announced exact plans for when we might expand into new markets, new languages, etc. But it is a technology that is a platform. We can do that, and we hope to share more as it develops,” says Sultan.
Early consumer feedback on AI is promising, says Spotify
The company didn’t want to develop a full suite of AI products because it wasn’t sure what consumer response would be to DJ. Would people like to have an AI DJ? Would they tackle this feature? None of it was clear. After all, Spotify’s voice assistant (“Hey Spotify”) was hurt by a lack of adoption.
But there were early signs that the DJ feature might work well. Spotify had tested the product with employees before launch, and usage and re-subscription metrics were “very, very good.”
Public adoption, so far, has matched what Spotify has seen internally, Sultan says. That means there is an opportunity to spin off future products using the same underlying foundations.
“People spend hours a day with this product… it helps them with selection, discovery, it tells them the next music they should listen to and tells them why… so the feedback. if you check different social media, you will see that it is very positive, emotional,” says Sultan.
In addition, Spotify shared that on days users tuned in, they spent 25% of their time listening with the DJ, and more than half of first-time listeners returned the very next day. However, these measurements are early as the feature is not yet 100% rolled out in the US and Canada. But they are promising, the company believes.
“I think it’s an amazing step in building really valuable product-user relationships,” says Sultan. But he warns that the challenge ahead will be “finding the right app and then building it right.”
“In this case, we said this is an AI DJ for music. We created the writers’ room for that. We put it in the hands of users to do exactly what it was designed to do. It works super well. But it’s definitely fun to dream about what else we can do and how fast we can do it,” he adds.