The Rise of Artificial Voices and How We Perceive Them

This summer, Dutch regional radio station RTV Drenthe hired a new newsreader: the news bulletins on Saturday and Sunday, still written by human journalists, are now read by an artificial voice. The regional television station Omroep Brabant goes one step further: they made a computer version of one of their presenters, Nina van den Broek. This AI-presenter primarily shows up in short explanatory videos on their website, as a supplement to news items. So far, this is still experimental, but clearly, they want to move with the times.

These are just two recent examples of the usage of synthetic, AI-generated voices in Dutch news media.1 I guess, this is the start of an almost unstoppable trend, and soon, we can no longer imagine what it was like without these technologies. The benefits are obvious: The voices are always available, fatigue-proof, multilingual, consistent, and customisable. Voiceovers are quickly generated, and can make the news more accessible by reading news reports aloud, whether or not in simple language (this is actually one of the main motivations of Omroep Brabant).

More generally, AI-generated voice assistants can guide us through traffic, serve as language buddy providing feedback on e.g., pronunciation and rhythm; we encounter them in ads, and when calling a service hotline. Personally, I am most impressed by the fact that synthetic voices can be used as voice replacements for people who have lost their voice due to medical conditions or treatments. AI can create a personalised synthetic voice that closely resembles the individual’s original voice.

On the other hand, the usage of AI-generated voices has downsides and is not without risks. AI-Nina provoked a lot of reactions. People are excited but also very concerned about potential loss of jobs. Although Omroep Brabant reassured the public that the computer version won’t replace the ‘real’ Nina anytime soon, voice actors might well become superfluous in the coming years. Another big problem is of course the unethical use of voice cloning, for example in deepfakes. Even if it’s not misused for creating misleading content or spreading disinformation, it can be disturbing if such an inherently personal thing as one’s own voice is suddenly somehow not that unique any more. You might have heard of the clash between Scarlett Johansson and OpenAI a couple of months ago. Ultimately, OpenAI suspended the voice that sounds strikingly similar to the actress, but this certainly won’t be the last time we discuss voice ownership.

Another issue that receives less attention is the concern that synthetic voices have the potential to reinforce voice-based gender stereotypes. The study by Shiramizu et al. (2022) presents evidence that social perceptions of synthetic voices is influenced by the same factors and power dynamics as the perception of human voices. Unsurprisingly, many companies already engage in voice marketing.

Although often still small-scaled and exploratory in nature, more and more research is done on the perception of artificial voices. Can humans still distinguish between human and artificial voices? Do AI-generated voices produce the Uncanny Valley phenomenon? This effect refers to the feelings of eeriness or discomfort that people experience when they encounter a synthetic entity such as a robot that closely resembles a human but is not quite perfect. And if so (to date, research is inconclusive), what does that implicate for, e.g., trust that people may have or not in the credibility of news that is presented by AI?

The Uncanny Valley effect (Smurrayinchester, CC BY-SA 3.0, via Wikimedia Commons)

Of course, there’s so much more to explore. I want to highlight just one other aspect, namely that of how our brains react to artificial voices. A current research project suggests that while people aren’t able to tell human and AI voices apart, their brains respond differently! While human voices trigger neural responses related to empathy and memory, AI voices activate brain regions related to error detection and attention regulation. And Gong (2023) found that human voices induced a stronger reaction in the brain, compared to an AI synthesised voice newscast. Hence, he asks: Do AI voices reduce cognitive activity?

While I am excited about future developments in artificial voice technology, I am also concerned because we know so little about how our brains react to these voices and about possible effects on communication with fellow human beings as well as with AI assistants. Therefore, and in line with my previous blogs, let’s not forget to reflect on our usage of these technologies and let us not become too dependent on them before it is too late.

References

Gong, C. (2023). AI voices reduce cognitive activity? A psychophysiological study of the media effect of AI and human newscasts in Chinese journalism. Frontiers in Psychology, 14, 1243078–1243078. https://doi.org/10.3389/fpsyg.2023.1243078

Shiramizu, V. K. M., Lee, A. J., Altenburg, D., Feinberg, D. R., & Jones, B. C. (2022). The role of valence, dominance, and pitch in perceptions of artificial intelligence (AI) conversational agents’ voices. Scientific Reports, 12(1), 22479–22479. https://doi.org/10.1038/s41598-022-27124-8

  1. In this blog, I don’t distinguish between ‘synthetic voice’ and ‘AI-generated voice’ and use these terms interchangeably, although some traditional speech synthesis methods do not necessarily rely on artificial intelligence. ↩︎