Why Sam Altman Cares So Much About Voice

Why Sam Altman Cares So Much About Voice

By MIKE MAGEE

When OpenAI decided to respond to clamoring customers demanding voice mediated interaction on Chat GPT, CEO Sam Altman went all in. That’s because he knew this was about more than competitive advantage or convenience. It was about relationships – deep, sturdy, loyal and committed relationships.

He likely was aware, as well, that the share of behavioral health in telemedicine mediated care had risen from 1% in 2019 to 33% by 2022. And that the pandemic had triggered an explosion of virtual mental health services. In a single year, between 2020 and 2021, psychologists offering both in-person and virtual sessions grew from 30% to 50%. Why? The American Psychological Association suggests these oral communications are personal, confidential, efficient and effective. Or in one word – useful.

As Forbes reported in 2021, “Celebrity endorsements, like Olympic swimmer Michael Phelps’ campaign with virtual therapy startup Talkspace, started to chip away at the long standing stigma, while mindfulness apps like Calm offered meditation sessions at the click of a button. But it was the Covid-19 pandemic and collective psychological fallout that finally mainstreamed mental health.” As proof, they noted mental health start-up funding has increased more than fivefold over the prior four years.

Altman was also tracking history. The first “mass medium” technology in the U.S. was voice activated – the radio. He also understood its’ growth trajectory a century ago. From a presence in 1% of households in 1923, it became a fixture in 3/4 of all US homes just 14 years later.

Altman also could see the writing on the wall. The up and coming generations, the ones that gently encouraged Biden to exit stage left, were both lonely and connected.

The most recent Nielson and Edison Research told him that the average adult in the U.S. now  spends four hours a day consuming audio and their associated ads. 67% of that listening was on radios, 20% on podcasts, 10% on music streaming and 3% on satellite radio.

Post-pandemic, younger generations use of online audio had skyrocketed.  In 2005, only 15% of young adults listened online. By 2023, it had reached 75%. And as their listening has risen, loneliness rates in young adults have declined from 38% in 2020 to 24% now.

A decade earlier, screenwriter Spike Jonze ventured into this territory when he wrote Her. Brilliantly cast, the film featured Joaquin Phoenix as lonely, introverted Theodore Twombly, reeling from an impending divorce. In desperation, he developed more than a relationship (a friendship really) with an empathetic reassuring female AI, voiced by actress Scarlett Johansson.

Scarlett’s performance was so convincing that it catapulted Her into contention for 5 academy awards winning Best Original Screenplay. It also apparently impressed Sam Altman, who, a decade later, approached Scarlett to be the “voice” of ChatGPT’s virtual lead. She declined, seeing the potential downside of becoming a virtual creature. He subsequently identified a “Scarlett-like” voice actor and chose “Sky” as one of five voice choices to embody ChatGPT. Under threat of a massive intellectual property challenge, Altman recently “killed off” Sky, but the other four virtual companions (out of 400 auditioned) have survived.

As for content so that “what you say” is as well represented as “how you say it,” companies like Google have that covered. Their LLM (Large Language Model) product was trained on content from over 10 million websites, including HealthCommentary.org. Google engineer, Blaise Aguera y Arcas says “Artificial neural networks are making strides toward consciousness.”

Where this all ends up for the human race remains an open question. What is known is that the antidote for loneliness and isolation is relationships. But of what kind? Who knows? Oxford’s Evolutionary Psychologist Robin Dunbar believes he does.

Altman likely paid close attention to this review by Atlantic writer Sheon Han in 2021: “Robin Dunbar is best known for his namesake ‘Dunbar’s number,’ which he defines as the number of stable relationships people are cognitively able to maintain at once. (The proposed number is 150.) But after spending his decades-long career studying the complexities of friendship, he’s discovered many more numbers that shape our close relationships. For instance, Dunbar’s number turns out to be less like an absolute numerical threshold than a series of concentric circles, each standing for qualitatively different kinds of relationships.… All of these numbers (and many non-numeric insights about friendship) appear in his new book, Friends: Understanding the Power of Our Most Important Relationships.”

But what many experts now agree is that voice seems to unlock the key. Shorthand for Altman: Pick the right voice and you might just trigger the addition of 149 “friends” for each ChatGPT “buyer.”

Mike Magee MD is a Medical Historian and regular contributor to THCB. He is the author of CODE BLUE: Inside America’s Medical Industrial Complex.(Grove/2020)

Leave a comment

Send a Comment

Your email address will not be published. Required fields are marked *