Shortly after California law enforcement officials announced that they’d tracked down a man they believe to be the Golden State Killer, it was reported that police had used public DNA databases to determine his identity.
The investigators apparently set up an account on a genealogy site using a fake name, and uploaded DNA sequences obtained from crime scenes years ago. Then they used the website’s software to search for people whose DNA indicated they might be related to the perpetrator — and came up with a list of families.
The police then cross-referenced information about members of those families with the likely demographics of the killer, and homed in on a suspect, Joseph James DeAngelo.
This extraordinary event highlights that when you send off a cheek swab to one of the private genome companies, you may sacrifice not just your own privacy but that of your family and relatives.
We may be glad that these heinous crimes may have been solved, but we must also be aware of the potential privacy risks this process reveals. Many Americans are rightly anxious about the misuse of their data by social media companies, but they should be at least as concerned about who has access to their genetic information.
Just as Facebook markets information to third-party companies, for-profit genome testing companies like 23andMe make money, in part, by selling anonymized genomic data. (California investigators reportedly used a lesser-known site, called GEDmatch.) Currently, the principal customers are pharmaceutical companies and biomedical researchers looking for new therapies.
Genetic information can be compromised in myriad ways
There are many ways this genomic information can be misused, especially if it can be tied back to the person it came from. Most people don’t realize that re-identifying genomes — identifying an individual from their genetic profile — is already possible. In one famous study, when provided with 10 anonymous genomes (just the sequence of bases), researchers could re-identify five.
This is worth reiterating: Given just a sequence of DNA bases — the adenines, thymines, guanines, and cytosines along the double helix — and no other information, it is now possible to work back to the specific person from whom it originated, out of the entire US population.
Humans share about 99 percent of their DNA bases with one another. The relatively few differences that exist are often enough to figure out who’s related to whom. And once you’ve tracked the genome back to one single person, you have lots of information about the genetics of that person and their relatives.
So far, genetic sequencing has produced few medical breakthroughs. Physicians can’t do much with the information that a given patient has, say, a 3 percent greater risk of dementia. But that data is potentially very useful to insurance companies and to employers trying lower their health care costs.
The Genetic Information Nondiscrimination Act, passed in 2008, prevents insurance companies and employers from forcing people to undergo genetic testing and bans companies from hiring or firing people based on genetics. But it doesn’t necessarily prevent bad actors from discreetly using advanced analytics to give themselves a commercial edge.
Members of Congress have already tried to remove some of the little genetic privacy protection that exists: One proposed bill would have let companies levy extra fees on employees who refuse to submit to genetic testing.
Recently, some companies have begun to offer genome sequencing as an employee benefit. While the prospect of finding out that you are a caffeine “fast metabolizer” or part Native American might seem attractive, employees might not grasp the risks.
The financial services industry offers a cautionary tale for the customers of the genome industry. Banks are highly regulated and supposed to provide state-of-the-art protection to their customers, yet they have been hacked. Compared to financial institutions, genome companies are lightly regulated, and therefore likely more vulnerable.
While the California investigators who may have found the Golden State Killer concluded that their novel approach was legal, courts have barely begun to explore these issues. At a minimum, the police may have violated the ancestry website’s terms of service, placing the data they obtained in a legal gray zone.
Given the large financial rewards and demonstrable vulnerability of databases, millions of American families should consider their genomic privacy already compromised. (And if the genome of one of your relatives is in one of these databases, yours might as well be too.)
The huge chasm between academic and commercial oversight
Commercial genetics companies lag way behind their academic counterparts when it comes to protecting people’s privacy. Biomedical researchers must meet rigorous standards if they want to sequence the DNA of research subjects. They get permission from multiple committees, obtain full patient consent, and are required to store data on extremely secure computers.
Meanwhile, the genomics and ancestor-searching industry encourages people to upload their genomes to public databases with minimal safeguards and no meaningful consent regarding how that information will be used.
Buried in the terms of service is often a provision giving the company the right to own and sell all or part of the person’s genome without compensation.
One odd side effect of widespread genomic testing is that it has also created problems for some insurance companies. When people learn they have excess risk of a disease like Alzheimer’s, many will rush out to buy long-term care insurance, undercutting the actuarial models of these insurance companies. If taken to an extreme, such behavior could send long-term care insurance pools into death spirals.
The Golden State Killer case is the tip of an iceberg: proof of principle that any crime leaving behind biological material from which genomic data can be extracted is likely solvable. Until recently, law enforcement was limited to databases of DNA obtained from convicted felons. If they now access other repositories of genomic information, including the huge commercial ones, then a sizable fraction of the long-stored rape kit results could likely be tied to specific perpetrators.
The technique used in California is currently laborious and expensive. But approaches making use of machine learning and other technology could speed it up. That could mean more solved crimes, but also potential false positives.
Where are we headed?
First, we have to acknowledge that the horse is out of the barn. Millions of Americans have sent specimens to the for-profit and not-for-profit genetic and genealogical companies; millions of others have participated in a research, have been screened as organ donors, or have passed through the criminal justice system.
Preventing this data from getting into the hands of people with malign intentions will be as hard as stopping identity theft. One very plausible scenario is that most or all of it will become available in a genomic version of the dark web.
If you happen to be a member of one of the few families in which not one member has yet to send off a cheek swab, you might want to consider opting out until society sorts out risks and benefits, and provides more privacy protections.
Most people, however, will have to wait passively and hope they will not be harmed by a genomic revolution that, while it may offer significant dividends down the road, has so far provided them with little benefit. Better regulation of the industry would be a start. But realistically speaking, any protections would be partial at best.
Norman A. Paradis, MD, is a professor of medicine at Dartmouth College and the director of emergency medicine research at Dartmouth Hitchcock Medical Center. A shorter version of this article was originally published by The Conversation.
The Big Idea is Vox’s home for smart discussion of the most important issues and ideas in politics, science, and culture — typically by outside contributors. If you have an idea for a piece, pitch us at email@example.com.