The study offers recommendations to improve statistical inference in population genomics

The 2nd century Alexandrian astronomer and mathematician Claudius Ptolemy had great ambition. Hoping to make sense of the movement of stars and the trajectories of planets, he published a masterful treatise on the subject, known as the Almagest. Ptolemy created a complex mathematical model of the universe that seemed to recapitulate the movements of the celestial objects he observed.

Unfortunately, a fatal flaw was at the heart of his cosmic plan. Following the prejudices of his time, Ptolemy assumed that the Earth was the center of the universe. The Ptolemaic universe, composed of complex “epicycles” to account for the motions of planets and stars, has long been recorded in the history books, although its conclusions have remained scientific dogma for over 1,200 years.

The field of evolutionary biology is nonetheless prone to flawed theoretical approaches, sometimes producing impressive models that yet fail to convey the true workings of nature that shapes the dizzying array of living forms on Earth.

A new study examines mathematical models designed to draw conclusions about how evolution works at the level of populations of organisms. The study concludes that such models must be constructed with great care, avoiding unwarranted initial assumptions, weighing the quality of existing knowledge and remaining open to alternative explanations.

Failure to apply strict procedures in constructing null models can lead to theories that appear to match some aspects of the available data derived from DNA sequencing, but fail to properly elucidate the underlying evolutionary processes. which are often very complex and multifaceted.

Such theoretical frameworks can offer compelling but ultimately erroneous pictures of how evolution actually acts on populations over time, whether they are populations of bacteria, schools of fish, or human societies and their various migrations during prehistory.

In the new study, Jeffrey Jensen, a researcher at Arizona State University’s Biodesign Center for Mechanisms of Evolution and a professor in the School of Life Sciences at the Center for Evolution & Medicine, leads a group of international luminaries in the field in providing advice for future research. Together they describe a range of criteria that can be used to better ensure the accuracy of models that produce statistical inferences in population genomics – a scientific discipline concerned with large-scale comparisons of DNA sequences within and between populations and species.

One of our key messages is the importance of considering the contributions of evolutionary processes that are certain to be in constant operation (such as purification of selection and genetic drift), before simply relying on hypothetical evolutionary processes. or rare as primary drivers of observed population variation (such as as positive selection).”

Jeffrey Jensen, researcher, Biodesign Center for Mechanisms of Evolution, Arizona State University

The research results appear in the current issue of the journal PLOS BIOLOGY.

A field matures

Population genomics emerged when early efforts in the field attempted to reconcile Charles Darwin’s notion of evolution through natural selection with early insights into the mechanisms of heredity, discovered by the Augustinian monk Gregor Mendel.

Synthesis culminated in the 1920s and early 1930s, largely due to the mathematical work of Fisher, Haldane, and Wright, who pioneered the exploration of how natural selection, along with other evolutionary forces, would alter the genetic makeup of Mendelian populations over time.

Today, studies in population genomics involve the large-scale application of various genomic technologies to explore the genetic composition of biological populations, and how various factors, including natural selection and genetic drift, produce changes in the genetic composition over time.

To do this, population geneticists develop mathematical models that quantify the contributions of these evolutionary processes in the formation of gene frequencies, use this theory to design statistical inference approaches to estimate the forces producing the observed patterns of genetic variation in real populations and test their conclusions against accumulated data. .

The spice of life

The study of genomic variation focuses on DNA sequence differences between individuals and populations. Some of these variants are of critical importance for biological function, including mutations responsible for genetic diseases, while others have no detectable biological effect.

Such variation in the human genome can take many forms. A common source of variation is known as single nucleotide polymorphisms, or SNPs, where a single letter of DNA in the genome is altered. But larger-scale variation of the genome, involving the simultaneous alteration of hundreds or even thousands of base pairs is also possible. Again, some of these alterations may play a role in disease risk and survival, while many others have no effect.

Natural selection can occur when different segregating variants in a population have a fitness differential with respect to each other. By devising and studying mathematical models governing the frequency change of corresponding genes and applying these models to empirical data, population geneticists seek to understand contributing evolutionary processes in a rigorous and quantitative way. Thus, population genetics is often considered the theoretical cornerstone of modern Darwinian evolution.

Adrift through the genome

Although the importance of natural selection in the evolutionary process is undeniable, the role of positive selection in increasing the frequency of beneficial variants -; the potential driver of adaptation -; is certain to be relatively rare compared even to other forms of natural selection. For example, purifying selection -; deletion of deleterious variants from the – population; is a constantly acting and much more invasive form of selection.

In addition, there are multiple non-selective evolutionary processes of great importance. For example, genetic drift describes the many stochastic fluctuations inherent in evolution. In large populations, natural selection can act more effectively by purging deleterious variation and potentially fixing beneficial variation, whereas, as populations become smaller, genetic drift will be increasingly dominant.

The distinction can be seen in dramatic form when comparing prokaryotic organisms like bacteria with organisms composed of eukaryotic cells, including humans. In the former case, the large population sizes tend to result in more efficient selection. In contrast, a weaker selection pressure operating in eukaryotes is more permissive to genomic modifications, provided they are not highly deleterious.

According to the neutral theory of molecular evolution -; a now guiding principle of the theory of evolution proposed by population geneticist Motoo Kimura more than 50 years ago -; most evolutionary changes at the molecular level in real populations are not governed by natural selection, but by genetic drift. The study underlines that this critical point is too often overlooked by evolutionary biologists. As co-author Michael Lynch, director of ASU’s Biodesign Center for Mechanisms in Evolution, observes, “Natural selection is just one of many evolutionary mechanisms, and failure to realize it is probably the most important obstacle to a successful integration of the theory of evolution with molecular, cellular and developmental biology.

The new consensus study further highlights that failure to consider these alternative evolutionary mechanisms that are certain to work, including genetic drift, and incorporate them into population genomics models, is likely to induce misguided researchers. Overreliance on purely adaptive models to explain genomic variation has led to a host of interpretations of questionable value, the authors argue.

The study presents a detailed flowchart that can help guide the development of more accurate models used to draw evolutionary inferences, based on genomic data. Biological parameters that vary among species include not only evolutionary variables such as population size, mutation rates, recombination rates, and population structure and history, but also how the genome itself even is structured and life history traits, including mating behavior. All of these factors play a vital role in determining the observed molecular variation and evolution.

“While these many considerations may seem daunting to some researchers, it is important to note that many excellent research groups at ASU and around the world are actively improving our understanding of these underlying evolutionary parameters, providing constant inference enhancement, for example, of mutation and recombination rates,” added co-author Susanne Pfeifer, assistant professor at the Center for Evolution & Medicine and the Biodesign Center for Mechanisms of Evolution.

Where once theoretical models of population genomics proliferated alongside relatively sparse genomic data, today an avalanche of data, made possible by the rapid and inexpensive DNA sequencing of organisms across the tree of life, radically changed the field. Careful and judicious use of this goldmine of genomic data will help advance the most rigorous models to unravel the many remaining mysteries of evolution.


Journal reference:

Johri, P. et al. (2022) Recommendations for improving statistical inference in population genomics. PLOS Biology.

#study #offers #recommendations #improve #statistical #inference #population #genomics

Leave a Comment

Your email address will not be published. Required fields are marked *