Ethnicity estimates in DNA testing are becoming more precise, accurate and consistent—but they’re still not equally robust across the market. Here’s an expert (and detailed) answer to this important question!
DNA ethnicity estimates: Performance varies
This article is going to teach you how to be a next-level consumer of the offerings for genetic autosomal ethnicity tests that are on the market today. The public suspects this, and the internal validations of ancestry testing companies back up the idea as well: the performance of ethnicity prediction tests are not equally robust across the market. Performance varies from company to company, and importantly is not uniform for all populations even within a single company’s genetic ethnicity panel.
This is relevant information to potential clients, because as a discerning consumer you may choose to use one company over the others depending on the performance statistics for particular ethnicities that are applicable to you (i.e., if you’re interested in specifically querying for Nigerian ancestry, you may not want to choose a company that has tepid performance measures for this region). Perceptive clients may also use the same measures to help interpret their genetic ancestry reports, using the knowledge of regionally specific performance to give weight to the validity of the predictions they receive.
Sorting DNA ethnicities: An M&M analogy
We’re going to get to the specific performance measures that are indicators of the robustness of the ethnicity estimation process, but first we have to talk about M&Ms and specifically a mythical M&M sorting machine. It’s powered by state-of-the-art software, precision robotics and a highly-tuned optical detection system. Top of the line. And it’s only job is to sort M&Ms by color into neat little piles. Haven’t you always wanted one of those? You pour a bag of entirely mixed up M&Ms into an opening in the top, and one by one it groups each delectable candy morsel with its same-color comrades on your table top. Let’s check out the results.
The piles are nice and neat, all the colors are pleasingly grouped. Very nice. I give that M&M sorting machine an A+ for performance.
Even though this first run delivered a perfect performance, the machine may not always get it right. There are two different ways we’re interested in today that describe how the performance in recognizing the color of the M&Ms can decline. A statistician watching this process would use the terms precision and recall to measure how well the machine identifies the correct color of each M&M. To start with, we’re going to designate a scenario where we focus on green as the troublemaker color. Here’s the next run of our M&M machine:
It looks like the machine didn’t get it quite right this time. All of the M&Ms ended up in the right pile, except for a stray red, yellow and blue that were all grouped with the green pile. The machine mistook the color of these errant M&Ms as green. This is a scenario that exhibits lower precision for detecting green M&Ms than in the initial perfect run. When other colors are misidentified as green, the performance of the machine is said to have lower precision in correctly recognizing green M&Ms. [Precision is a term that is explicitly calculated and expressed as a percentage. In the first perfect run precision and recall are both 100%. In this second run, the number of greens were overestimated so there is lower precision for green. The machine identified 15 greens, but there were only 12. Precision = 12/15 = 0.80 = 80%.]
A somewhat converse experience can occur as well, where green M&Ms can be mistaken for other colors.
When M&Ms that are green are misidentified as a different color, the performance of the machine is said to have lower recall for green. [Recall is also expressed as a percentage. The number of greens in this run were underestimated, so the ability of detecting green exhibits lower recall. The machine identified 9 greens, but there were actually 12. Recall = 9/12 = 0.75 = 75%.]
Precision and recall are not just useful for evaluating piles of M&Ms, but are utilized heavily in assessing the efficacy of ethnicity estimation processes. Among other measures, they can give a valuable snapshot of how well the process performs on a case-by-case basis across the different populations the companies support.
There are variations in the exact methods that companies use to extract these indicators from their data, but they begin with a collection of individuals with known and unmixed ethnicity to test each of the genetic populations supported by its reference panels. Since by definition these known individuals are from a single population, a perfect performance of the ethnicity estimation process would assign them 100% into their original population. This is not always the case, and the companies closely evaluate the nature of the misassignments using precision and recall statistics, as this gives an indication of how their process would be expected to perform for commercial clients.
Just like the M&M example above, populations that are prone to misassigned individuals exhibit this tendency in two ways. If we are interested in examining the performance of Sweden in ethnicity estimation, lower precision would manifest with the individuals that are known to be from other nearby countries (say Denmark, Norway or Finland) being assigned to the Swedish cluster.
Misassignments do not necessarily always happen between adjacent regions, but this is often the case.
In the case of lower recall for Sweden, individuals that are known to be from Sweden are misidentified as having ancestry from other regions (perhaps Finland, Norway or Denmark).
In the course of this validation process, the companies may go through many iterations of testing where they merge regions that consistently and reciprocally misassign test individuals, split other regions that show genetic distinctness, and otherwise redefine geographic boundaries for their proposed genetic populations (i.e., AncestryDNA currently combines NW Continental Europe and England into a single genetic population. This is at least in part due to low precision and recall obtained in internal validations when attempting to split those regions into autonomous genetic groups). Precision and recall measures, among other tests, guide each round of evaluation until they come to an arrangement where the statistics indicate that the ability to detect true genetic ancestry is maximized, given the characteristics of the reference panel.
The output from these rounds of testing is complex, and precision and recall numbers are used heavily to sort out the success or problems with each region. In contrast to our first M&M examples where most colors were nicely sorted together in their target pile, much of the output from these testing runs can seem thoroughly mixed up at first glance, requiring some tenacity to detect underlying trends.
In the case of a perfect run where the colors of all M&Ms are identified correctly, precision and recall are both 100%. With this more convoluted output where many M&Ms are misidentified, precision and recall both decline and can be explicitly calculated.
Visual inspection supports the trends that show up in the numbers. Blue looks pretty solid. There’s only one yellow in the blue pile so it exhibits high precision (94%), and there are just a few blues that went out to the green pile so the recall (85%) is also high. The machine’s performance with identifying green is poor as it consistently categorizes other colors as green (low precision, 34%), and also frequently mistakes green for a different color (low recall, 41%). Orange is an interesting case as it performs with higher precision (80%) since its pile consists of mostly solid orange, but recall (50%) is much lower because orange was often misidentifed as green, brown and yellow.
These same types of observational methods can be applied to the output of ethnicity prediction algorithms. When the algorithm designates a cluster of test samples as Swedish and most of these test individuals truly are from Sweden (not Denmark, Norway or otherwise), the performance of the process shows high precision. And when most of the Swedish samples available were actually assigned to the Sweden group (rather than mistakenly assigned to Finland, Norway or elsewhere) the algorithm is said to exhibit high recall.
So the burning question is, “What precision and recall statistics are available for each of our genetic genealogy testing company’s reported ethnicities?” There ARE some—and once you see them, you’ll be clamoring for more. Read the answers here in this follow-up article.
Don’t miss out on our other ethnicity estimate information. We put together our best tips and tricks for understanding your DNA ethnicity results into one handy guide, free for you to download!
Statistics tend to make my eyes cross but piles of M&Ms—I get it!! Thanks Diahan for the great analogy.
It’s all about the pictures!
My Ancestry DNA report states 62% UK, 28% Ireland, 10% Norway. I uploaded the data to FTDNA, who reported UK/Ireland 89% and Southern Europe (Greece/Italy, etc) 11%. I was pleased that they matched for UK/Ireland, but confused about the Norway Vs S. Europe allocation. Which is more correct? Living DNA reported all UK/Ireland and an unallocated 12.5%. Thanks for article, which I’m now going to read, maybe I’ll get a better idea what’s going on.
Leslie, well, "more correct" is actually kind of a tough thing to call. For sure you can know that FTDNA doesn’t have a Norway category, so you can’t expect that from them. But Norway isn’t exactly Southern Europe, so… The "answer" or at least a better answer can usually be found in the numbers for the reference populations, as well as the precision and recall numbers that are coming up soon! So stay tuned!
Thanks for the reply. My haplogroups are I1a and U5b1b1, both highly Scandanavian specific. Regards autosomal dna, I could understand if FTDNA mistook Norway for another nearby country, but to allocate virtually the exact same % to southern Europe instead has surprised me. I have been researching my ancestry for years and haven’t come across anyone from either of those parts of the world. It’s all kinda fascinating, if frustrating.
I think fascinating and frustrating describes this perfectly!
Please define what M&Ms are!
M&M’s the candy!
UK = Smarties.
What is mistaken for being British. Many people I know have high British as ethnicity – yet their DNA family and Trees support each other and not the British ethnicity. My great grandparents on both sides mostly came from Germany. I have 2 American lines that came from England back in the 1500’s. Yet even though my German line is the recent line, many sites suggest otherwise by ethnicity. I know it is just an estimate and that with all the moving around back 1000’s of years ago could account for this? But what are your thoughts?
German is a hard ethnicity to tease out from your DNA because Germany is such a crossroads. Much of my own German DNA is being reported as British. British is kind of a catch-all category in a way. You will see lots of your French and German DNA reported as British. Hopefully that will change as our companies get more people involved in their reference populations.
One possibility for the surprise Norway match. In the 8th century, the French were being invaded by the Vikings, coming down the Seine and sacking Paris regularly. The landowners made a treaty with one group of Vikings. If they would keep everyone else out, the French would give them Normandy. The Vikings settled in Normandy, protected the Seine, brought their families and intermarried with the French. The Normans who conquered England in 1066 and defeated the native Saxons were descendants of these Vikings who settled Normandy. So your "British" ethnicity is probably well mixed with Vikings, by way of France.
Thanks for this insight. There is so much to learn about history that will impact how we understand our genetics!
Britain was a destination and refuge for religious and other groups for many centuries. There were successive waves of Jews, Huguenots, Dutch protestants and others that continues to this day. London, for example, was and remains a melting pot only recently rivaled by New York. There are many other historical anomalies in Europe like Denmark encouraging Dutch farmers to settle there which might result in somebody thinking they have Danish origins but discovering Dutch relatives.