Spotify can use AI to make host-read podcast ads sound like real people

With Spotify’s AI DJ, the company trained an AI in the voice of a real person: that of its director of Cultural Associations and podcast host, Xavier “X” Jernigan. Now, the streamer can turn that same technology into advertising, it seems. According to statements made by The Ringer founder Bill Simmons, the streaming service is developing artificial intelligence technology that will be able to use a podcast host’s voice to make announcements read by the host, without the host having to to read and record the ad copy.

Simmons made the remarks at a recent episode from “The Bill Simmons Podcast,” saying, “There’s going to be a way to use my voice for commercials. Obviously, you have to give the voice approval, but it opens up, from an advertising standpoint, all these different great possibilities for you.”

He said these ads could open up new opportunities for podcasters because they could target ads geographically, like tickets to a local event in the listener’s hometown, or even create ads in different languages, with the host’s permission.

His comments were first reported by Traffic lights.

Spotify acquired The Ringer in 2020, but it was unclear if Simmons was authorized to discuss the streamer’s plans in this area, as he began by saying, “I don’t think Spotify will be mad at me about this…” before sharing the information.

When asked for comment, Spotify did not directly confirm or deny the development of the feature.

“We are always working to improve the Spotify experience and test new offers that benefit creators, advertisers and users,” a Spotify spokesperson told TechDigiPro. “The AI ​​landscape is rapidly evolving and Spotify, which has a long history of innovation, is exploring a wide range of applications, including our very popular AI DJ feature. There has been a 500 percent increase in the number of daily podcast episodes discussing AI over the last month, including the conversation between Derek Thompson and Bill Simmons. The publicity represents an interesting canvas for future exploration, but we have nothing to announce at this time.”

The subtext of this comment indicates that Simmons’ statements may have been somewhat premature.

That said, Spotify has already hinted that the AI ​​DJ in the app today wouldn’t be the only AI voice users would encounter in the future. When Jernigan was recently asked about Spotify’s plans to work with other voice models in the future, he quipped, “stay tuned.”

The streamer has also been quietly investing in AI research and development, with a team of a few hundred now working in areas like personalization and machine learning. Additionally, the team has been using the OpenAI model and investigating the possibilities of long language models, generative speech, and more.

Spotify’s ability to create AI voices specifically leverages IP from Spotify’s 2022 acquisition of Sonatic combined with OpenAI technology. It may choose to use its own in-house AI technology in the future, the company recently told us.

To create AI DJ, Spotify had Jernigan go into a studio to produce high-quality recordings, including ones where he reads lines with different cadences and emotions. He kept his pauses and breaths natural on the recordings, and made sure to use language he already speaks, like “melody” or “bangers” rather than just “songs.” All of this is then fed into the AI ​​model which then creates the AI ​​voice.

The company has explained to detail the process in more detail or say how long it took to turn Jernigan’s recordings into an AI DJ. But, given his potential interest in turning his podcast hosts into AI voice models, he must be developing a pretty efficient process here, and one that could possibly take advantage of a podcaster’s existing recordings.

While AI voices aren’t new, the ability to make them sound like real people is a more modern development. A few years ago, Google wowed the world with a human-like AI in Duplex that could call restaurants for reservations. But the technology was initially criticized for its lack of disclosure. This month, Apple introduced an accessibility feature, Personal Vocie, that can mimic a user’s own voice after they first train the model by spending 15 minutes reading randomly chosen prompts, processed locally on their device.


Scroll to Top