Whole genome sequencing (WGS) is now available and even (relatively) affordable, especially during Black Friday sales. Should you do it?
Black Friday is coming. Like Amazon and Walmart, the consumer genetics industry throws their hat in the ring with killer deals that you won’t see at any other time. One of the emerging products on the market is none other than the juggernaut of personal genomics: Whole Genome Sequencing (WGS). Yes, all 3 billion basepairs of your own sequenced DNA can be yours, delivered on disk or download. (Unless you decide to use your splurge money on another Instant Pot.)
Whole genome sequencing (WGS): Really??
Can we just pause for a moment and acknowledge that this is pretty amazing? Today, in 2019, there are about 1 million people globally who have been sequenced. A million! Twenty years ago, whole consortia of scientists were only about halfway through the Human Genome Project. Taking on the mammoth task of determining the order of all the As, Ts, Cs and Gs in every chromosome from top-to-bottom for the first time was an ambitious and inspiring undertaking. It made us gawk and guess at what the future could hold when the secrets of the genome could be laid out before us in plain sight. And now here we sit today, in the future, where instead of costing $2.7 billion and taking 13 years to complete, a full genome can be sequenced for $1,000 in a couple of days.
Somehow offering the human genome up for sale on Black Friday cheapens the concept for me a bit, as if it could be for sale. Could we just as soon add Mt. Everest or the Mona Lisa to an internet check-out cart? The elegant code that since the very beginning has embodied all the life, diversity and biological resiliency we see on this planet has reached the status of a commodity. But in doing so, it has the reach to affect the perspective we take on human origins, health, and relationships for the entire planet. Quite the reach!
Undoubtedly, WGS will become the standard of care for medical genetics, and the accessible definitive for forensics, anthropology and ancestry that SNPs are today. But before you click ‘Buy Now’, here are some questions to consider if you’re thinking about being an early adopter of unveiling your entire genome.
How are WGS results delivered?
If you’ve taken an ancestry test from 23andMe or the like, you’ve probably received some useful take-aways: cousin matches, trait predictions, and an ethnicity analysis. The interpretation of your raw DNA is included as part of the service you purchased. The companies using SNP data have built a robust infrastructure of databases that powers its predictions with millions of members tested on the same SNPs.
When you buy a WGS, you’ll be among the first to pocket your own sequence, but you’ll also have a ton of data with very limited tools to interpret it. A powerful infrastructure of public WGS databases with millions of members simply does not yet exist.
What do you get with whole genome sequencing?
Instead of your sequencing results being delivered as a fully-interpreted ancestry report, you’ll get a 200 GB file full of raw DNA data. It’s not the actual straight-up 3 billion base pairs– that would be pretty long and boring. Look for data to come in one of three formats: FASTQ, BAM and VCF.
FASTQ is the data format that comes straight off of the machines that produce the sequence reads. The sequence letter and an associated score that measures the quality of the call are encoded together, making for one big baffling dump of data, uninterpretable to the naked eye.
The BAM (Binary Alignment Map) file is also made available to clients. Genomes are sequenced in a myriad of small broken up pieces and then assembled by matching the fragments to an intact reference sequence, which is a sequence that academics have established as a standard for comparison.
When you assemble a jigsaw puzzle, the picture on the box can help you figure out how to arrange the dismantled pieces. The reference sequence acts as the ‘picture on the box’ and becomes the map for assembling the fragmented pieces into one unified sequence. A BAM file is the product of mapping the fragments into one continuous sequence, and is also compressed by encoding into binary for significant space savings. Again, this is a huge data file with no real meaning that can be derived just from viewing it in isolation.
Clients can also receive their file in VCF, Variant Call Format. Comparing one human genome to another results in agreement of over 99.4%. We are all astonishingly similar to one another when it comes to our genetics. Isn’t it incredible that it is the 0.6% difference that drives all of the stunning diversity we see in humanity? A VCF file catalogs just those differences, leaving out everything else that is the same. Compared to the reference sequence, it gives base pair information for the points at which a mutation is detected in a client, and the nature of the mutation.
What can WGS tell you?
To be clear, reporting offerings do vary by company but they generally provide a summary of your mutations and interpretations of health risk that may be elevated as a result. These reports are heavily weighted toward health applications, with some ancestry information in there too. Some clients have reported they have found it useful to further consult a genetic counselor to understand if any action is needed to address conditions highlighted in their report.
This is all certainly a state-of-the-art experience, but just remember that the amount of reference data that backs up interpretations derived from WGS data is still in the beginning phases of being built. If you are interested primarily in genetic ancestry, rather than precision medicine, you’ll get a greater bang for your buck by just sticking with the more highly developed autosomal tests at this point. Their databases are huge, and are well out of beta stage, with several years of refinement to their reference populations and computational methods.
What type of sequencing test should I choose?
There are several choices on the consumer market today.
Whole exome sequencing generates data for only the portion of the genome (about 1%) that encodes for proteins that drive the cellular processes that make our bodies work. Sequencing the exome is only used for medical inquiries, not ancestry.
What level of sequencing coverage you choose is an important selection point for clients. With the current sequencing process, the accuracy rate for determining each individual base pair letter is very high. But because the genome is so large (3.2 billion base pairs), even with accuracy at 99.99% for each base pair, on average you’ll still end up with 320,000 errors. If that sounds like a lot to you, you’re right. To combat this, the sequencing process is performed multiple times across the whole genome and differences at individual base pair points are reconciled with the multiple reads.
- 30x coverage is medical grade sequencing. If you want to use your DNA data to make medical decisions, this is the sequencing depth you want. 30x means that on average each base pair has been read 30 times before generating your final sequencing report. As you can imagine, extra reads mean extra cost. These tests regularly ring up in the $500-$1000 range. This is where some incredible Black Friday offers were seen last year: $199.
- .4x coverage is really a partial read of the genome (about 40%) that is supplemented by imputation, which is a statistical method that infers the sequence of the gaps in the data (the remaining 60%) rather than reading it directly. By consulting databases of full human sequences, the sequence of the gaps is inferred based on the mutations seen in the directly-read fragments. This is possible because sets of mutations often occur together. Accuracy is reported at 99%. This won’t fly for medical inquiries; for ancestry purposes the jury is out on whether this is “good enough.” Low-pass coverage like this makes a $99-$149 price tag possible.
What tools are available for interpretation of WGS data for ancestry purposes?
If your primary interest is relative matching, there are currently no WGS companies that offer this service. Stick with autosomal testing for now. As WGS databases are developed, the ability to detect beyond 4th degree relatives will be enhanced compared to the matching resolution available with SNP methods. That’s exciting.
As we’ve said, there is some limited interpretation provided for learning more about ethnic ancestry from WGS data, but if you have some hacker skills, it could be fun to take it to the next level. Most of the useful interpretations will come from computationally extracting SNPs used by AncestryDNA, 23andMe, etc., from the WGS data and then uploading to GEDmatch or other utility for comparisons (but if this is your end goal, it’s a lot easier just to stick with autosomal testing in the first place).
Just to get your creative juices flowing, it is possible to detect new SNPs in your DNA that aren’t used by any of the mainstream companies. But further interpretation of these is mostly available for the Y chromosome: for example, to further resolve your Y chromosome haplogroup. Feasibly, you could also extract STR data for which there are also comparison databases.
Debbie Kennett gives a great discussion of some of the resources for further dissecting your WGS data, and the ISOGG wiki also lists many web-based genealogical tools that can further your raw DNA data inquiries. So dust off those hacker skills!
What about scientific research and privacy?
Perhaps the most altruistic reason for having your genome sequenced at this point in history is to allow your data to be part of developing the science. Companies allow you to opt in and out of various research initiatives. Most are aimed at health and wellness. Before participating, understand whether the organizations are HIPAA-compliant and if they utilize protected and encrypted information sharing. Also understand that whatever valuable information you receive personally from the company, especially if you are receiving a below-market price for your test, their motivation for this incentive is purely the possibility of curating your DNA for further research.
The misuse of sensitive personal information is a primary reason why many abstain from any kind of genetic testing at all. Companies certainly understand this, and want to entice participants by having strong privacy practices in place. I personally take companies at their word that in good faith they have no intent to violate my wishes for the use of my DNA. I also recognize from experience though, that DNA collections and data can change hands down the road, and the original agreement between client and research institution may not remain relevant. It’s a brave new world out there!
How much do you want to know?
Are you pretty laid back about your health? Or do you find it’s easy for you to spin out in anxiety over micro-symptoms? It is a supremely personal decision for every individual faced with the choice of actively prognosticating the level of health they may enjoy in their future. These tests are marketed with the stance of proactively managing your health by understanding your personal risk for developing disease. If you know breast cancer runs in your family, for example, genetic testing could allow you to understand if you have specific mutations that are associated with this type of cancer. Some patients then are able to receive preventative procedures or drugs that greatly mitigate their risk for developing breast cancer, and they consider this lifesaving.
However, results of genetic tests are not always clear cut and can leave you in the land of ambiguity, not fully understanding if your risk is truly elevated or not. This may not be a very peaceful place for some. Understand before going into a WGS testing experience what diseases you may be screened for, and if the genes contributing to the disease are well understood or if the test is likely to produce an ambiguous result.
Another situation was one faced by James Watson, one of the scientists who received the Nobel Prize for elucidating the structure of DNA in 1953. In 2008, at 79 years old, a research group invited him to be the second person ever to have his genome sequenced. He agreed to do so and have his sequence be made publicly available for research, but asked not to be made aware of gene information about Apolipoprotein E, which is associated with Late-Onset Alzheimer’s Disease. It is incurable and claimed his grandmother. For Dr. Watson, an expert in human genetics, he preferred to remain in the dark on whether or not he was likely to develop incurable Alzheimer’s. For him, that was more than he wanted to know.
This is an important question for all of us to ask when faced with the idea of laying bare everything encoded in our genome. How much do you want to know, and under what conditions would you prefer to remain unaware of future possibilities?
It’s quite a bit to consider before loading up your Black Friday shopping cart, don’t you think? Whether you decide to be an intrepid early adopter of WGS, or if you know right off the bat it’s just not for you, we can all be grateful to those who decide to make their genomes available to drive scientific discovery forward. All of us today that receive medical care or run a DNA cousin match test, stand on the shoulders of the people who made beneficent contributions of their biology to scientific research. Our children’s generation and forward will benefit from the discoveries made today. Will you be a part of this, too?
Take the next step
It’s time to take the next step in your DNA journey. My Autosomal DNA quick reference guide takes you through the process of testing at 23andMe, AncestryDNA, or other autosomal DNA testing companies. Learn when to test; what the test could tell you; a detailed comparison of the different testing companies; how to understand and use your ethnicity results; and more. This inexpensive purchase can save you TONS of money by helping you purchase the right test—and then use it effectively. It’s got an exclusive side-by-side comparison table of the 5 major testing companies. It answers key questions, like what DNA testing can (and can’t) tell you, and choices for controlling your privacy.
Out of date. In 2021, sequencing.com provides a subset file that is compatible with third party services such as GEDMatch.