Human genetic clustering refers to patterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods used to study this aspect of human genetic variation.
Clustering studies are thought to be valuable for characterizing the general structure of genetic variation among human populations, to contribute to the study of ancestral origins, evolutionary history, and precision medicine. Since the mapping of the human genome, and with the availability of increasingly powerful analytic tools, cluster analyses have revealed a range of ancestral and migratory trends among human populations and individuals.[1] Human genetic clusters tend to be organized by geographic ancestry, with divisions between clusters aligning largely with geographic barriers such as oceans or mountain ranges.[2][3] Clustering studies have been applied to global populations,[4] as well as to population subsets like post-colonial North America.[5][6] Notably, the practice of defining clusters among modern human populations is largely arbitrary and variable due to the continuous nature of human genotypes; although individual genetic markers can be used to produce smaller groups, there are no models that produce completely distinct subgroups when larger numbers of genetic markers are used.[2][7][8]
Many studies of human genetic clustering have been implicated in discussions of race, ethnicity, and scientific racism, as some have controversially suggested that genetically derived clusters may be understood as proof of genetically determined races.[9][10] Although cluster analyses invariably organize humans (or groups of humans) into subgroups, since the work of evolutionary biologists such as Richard Lewontin, Luigi Cavalli-Sforza, and Marcus Feldman in the 1970s there is virtually no debate within human genetics that any of these genetic clusters can be attributed to races, nor does knowing any individual's skin tone or continent of origin constitute a meaningful prediction of specific alleles.[11] And, because there is such a small fraction of genetic variation between human genotypes overall, genetic clustering approaches are highly dependent on the sampled data, genetic markers, and statistical methods applied to their construction. It has also been repeatedly demonstrated by various methodologies that the five races (caucasoid, mongoloid, negroid, American or "red", and Malay) historically purported by scientific racism do not comport with population substructures derivable from any modern genomic datasets.[12] Rather, the evidence for clinal patterns of human genetic variation overwhelms that pointing towards distinct groups defined by skin pigmentation or skull shape,[11] and arbitrarily invoking five population clusters in an attempt to test the genomic validity of scientific racism instead yields three "races" within Africa, one encompassing most of Europe and mainland Asia, and one encompassing Australia, the Americas, and the Pacific Islands.[13]
^Auton, Adam; Abecasis, Gonçalo R.; Altshuler, David M.; Durbin, Richard M.; Abecasis, Gonçalo R.; Bentley, David R.; Chakravarti, Aravinda; Clark, Andrew G.; Donnelly, Peter; Eichler, Evan E.; Flicek, Paul; Gabriel, Stacey B.; Gibbs, Richard A.; Green, Eric D.; Hurles, Matthew E. (October 2015). "A global reference for human genetic variation". Nature. 526 (7571): 68–74. doi:10.1038/nature15393. hdl:11693/38161. ISSN1476-4687.