Why 23andMe’s Genetic Data Could Be a ‘Gold Mine’ for AI Companies

Why 23andMe’s Genetic Data Could Be a ‘Gold Mine’ for AI Companies

The genetic testing company 23andMe, which holds the genetic data of 15 million people, declared bankruptcy on Sunday night after years of financial struggles. This means that all of the extremely personal user data could be up for sale—and that vast trove of genetic data could draw interest from AI companies looking to train their data sets, experts say.

“Data is the new oil—and this is very high quality oil,” says Subodha Kumar, a professor at the Fox School of Business at Temple University. “With the development of more and more complicated and rigorous algorithms, this is a gold mine for many companies.”

[time-brightcove not-tgx=”true”]

But any AI-related company attempting to acquire 23andMe would run significant reputational risks. Many people are horrified by the thought that they surrendered their genetic data to trace their ancestry, only for it to now be potentially used in ways they never consented to. 

“Anybody touching this data is running a risk,” Kumar, who is the director of Fox’s Center for Business Analytics and Disruptive Technologies, says. “But at the same time, not touching it, they might be losing on something big as well.” 

Read More: 23andMe Filed for Bankruptcy. What Does That Mean For Your Account?

Training LLMs

Companies like OpenAI and Google have poured time and resources into making an impact on the medical field, and 23andMe’s data trove may attract interest from large AI firms with the financial means to acquire it. 23andMe was valued at around $48 million this week, down from a peak of $6 billion in 2021.

These companies are striving to build the most powerful general purpose models possible, which are trained on vast amounts of granular data. But researchers have argued that high-quality data sources are drying up, which makes new and robust information sources all the more coveted. A TechCrunch survey of venture capitalists earlier this year found that more than half of respondents cited the “quality or rarity of their proprietary data” as the edge that AI startups have over their competition.

 “I think it could be a really valuable data set for some of the big AI companies because it represents this ground truth data of actual genetic data,” Kazlauskas says of 23andMe. “Some of the human errors that might exist in bio publications, you could avoid.”

Kumar says that 23andMe’s data could be especially valuable to companies in their push for agentic AI, or AIs that can perform tasks without the involvement of humans, whether in medical research or company decisionmaking. 

“The whole goal of agentic AI models has been a modular approach: you crack the smaller pieces of the problem and then you put them together,” he says. 

Representatives for Google and OpenAI did not immediately respond to requests for comment.

Industry-Based Value

23andMe’s data could also be valuable across different industries using AI to sort through vast amounts of data—first and foremost, medical research. 

23andMe already had agreements in place with pharmaceutical companies such as GlaxoSmithKline, which tapped into the company’s data sets in the hopes of developing new treatments for disease. Kumar says that at Temple, he and colleagues are working on a project to create personalized treatment for ovarian cancer patients—and have found that genetic data can be “very, very powerful in understanding structures that we were not able to understand,” he says. 

However, Alex Zhavoronkov, founder and CEO at Insilico Medicine, contends that 23andMe’s data may not be as valuable as some think, especially in relation to drug discovery. “Most low hanging fruits have already been picked up and there is significant data in the public domain published together with major academic papers,” he wrote in an email to TIME. 

But companies in many other industries will likely be interested, too. This is an abnormally large and nuanced data set: This amount of genetic data, especially that which comes with personal health and medical records, is rarely publicly accessible, says Anna Kazlauskas, CEO of Open Data Labs and the creator of Vana, a network for user-owned data. “All of that contextual data makes it really valuable—and hard data to get,” she says. 

Potentially interested industries include insurance companies, who could use the data to identify people with greater health risks, in order to up their premiums. Financial institutions could track the relationship between genetic markers and spending patterns in the process of assessing loans. And e-commerce companies could use the data to tailor ads to people with specific medical conditions. 

Ethical and Privacy Concerns

But companies also face significant reputational risks in getting involved. 23andMe suffered a hack in 2023 which exposed the personal data of millions of users, severely hurting the company’s reputation. Bidders who come from other industries may have even less data protection than 23andMe did, Kumar says. “My worry is that some of the companies are not used to having this kind of data, and they may not have enough governance in place,” he says. 

This is especially dangerous because genetic information is inherently sensitive and cannot be altered once compromised. The genetic information of family members of people who willingly gave their data to the company are also at risk. And given AI’s well-known biases, the misuse of such data could lead to discrimination in areas like hiring, insurance and loans. On Friday, California Attorney General Rob Bonta released an “urgent” alert to 23andMe customers advising them to ask the company to delete their data and destroy their genetic samples under a California privacy law. 

Eva Galperin, director of cybersecurity at the Electronic Frontier Foundation, worries that 23andMe’s genetic data might exist in a state of permanent flux on the market. “Once you have sold the data, there are no limits to how many times it may be resold,” she says. This could result in genetic data falling into the hands of organizations that may not prioritize ethical considerations or have robust data protection measures in place.

Insilico Medicine’s Zhavoronkov says all of these fears mean that potential AI-related bidders will be dissuaded from trying to purchase 23andMe and its data. “Their dataset is actually toxic,” he says. “Whoever buys it and trains on it will get negative publicity, and the acquirer will be possibly investigated or sued.”

Regardless of what ultimately happens, Kazlauskas says she is at least thankful that this conundrum has opened up larger conversations about data sovereignty. “We should probably, in the future, want to avoid this kind of situation where you decide you want to do a genetic test, and then five years later, this company is struggling financially, and that now puts your genetic data at risk of being sold to the highest bidder,” she says. “In this AI era, that data is super valuable.”

Leave a comment

Send a Comment

Your email address will not be published. Required fields are marked *