Using physics to search for meaning in the chaos of gene regulation

Thursday, July 9, 2020

By applying ideas from theoretical physics, researchers are starting to develop a unified understanding of how genes are regulated. Below, graduate student Jonathan Liu explores the interface between physics and biology and discusses the field of biophysics with two researchers.


In 2003, the Human Genome project—a milestone collaboration that cost $2.7 billion and lasted 13 years—finally succeeded in obtaining the sequence of the human genome. Since then, sequencing technology has improved radically—today, we can sequence all 3 billion letters of the human genome in less than a day and for under $1000.

Although DNA sequencing has provided us with the entire library of the human genome, the bulk of the books within are written in a language that we have yet to understand. That’s because only 2% of all human DNA actually codes for proteins, with the other 98% achieving a murkier purpose, even no clear purpose at all. Without this understanding, the functional nature of DNA remains a mystery.

“We don’t truly understand something unless we can reconstruct it from scratch,” says Hernan Garcia, an assistant professor in the Molecular and Cell Biology and Physics departments. He is a proponent of the proof-by-synthesis approach, the notion that true understanding of a system requires the ability to rationally design it from the bottom up without resorting to trial and error. This conundrum is perhaps best exemplified in the case of embryonic development, the process by which a group of stem cells matures into a population of differentiated adult cells. By systematically studying developmental processes in the fruit fly— one of biology’s most well-understood model organisms—geneticists discovered that development is controlled by an intricate network of genes turning on and off in a regulated fashion, and they managed to construct a genetic blueprint of sorts describing this process.

Nevertheless, such a blueprint is only a “who’s who” of the relevant players, a list of characters without a script. While the qualitative relationships between these developmental genes are known, the mathematical rules governing the precise form of these relationships still elude us. Without these underlying equations, proof by synthesis is fundamentally impossible.

More specifically, developmental genes code for special proteins called transcription factors, which control the behavior of target genes by chemically binding to special sequences known as regulatory DNA. Through this regulation, the target genes then produce their own transcription factors, resulting in a complex cascade of interactions controlling important developmental decisions, such as the distinguishing of an embryo of its head from its tail. Accounting for all of the transcription factors and genes involved in early embryonic development thus yields a genetic network, analogous to the wiring diagram of a computer circuit.

Unlike an electronic circuit, however, these constructed genetic networks lack a crucial element: quantitative predictability. Although the wiring diagram of a modern processing unit may involve thousands of component parts, each line and symbol correspond to a mathematical equation that is well-understood and predictable. These fundamental equations are still lacking in much of genetics.

For example, one of the most important genes involved in early fly development is one called hunchback, whose activity contributes to differentiating an embryo’s head from its tail. While it is well-established that hunchback’s activity is activated by a transcription factor named Bicoid, researchers are still hunting for the precise equations that describe this relationship. Notably, we still cannot predict precisely how much hunchback activation to expect, given the amount of Bicoid present.

To tackle this mystery, Garcia’s lab applies techniques historically used in theoretical physics. The world of physics often conjures up images of lasers, nuclear power plants, or black holes—all concepts far removed from those of genomes and proteins. Nevertheless, physics offers a fundamental assertion that the natural world can be quantitatively – and most importantly, predictively – described with mathematical equations and theories.

This merging of biology and physics broadly falls under the label of biophysics—the application of tools and techniques from physics to solve research questions in biology. Such an approach has seen great successes, for example in describing the structure of proteins to characterizing the motion of swimming bacteria.

A complex, messy world

Working at the interface between two disparate fields – physics and biology – can be a daunting task. For physicists, it’s necessary to fully immerse into the complex, even messy, world of biology. “For me, grappling with the breadth of biological knowledge requires patience and humility,” remarks Gabriella Martini, a graduate student in Garcia’s lab.

Like many biophysicists, Martini majored in physics as an undergraduate with no particular inclination towards biology. While her undergraduate research at MIT focused mainly in theoretical particle physics, Martini credits a senior year biology course with attuning her to the fascinating research problems in biophysics.

When applying to graduate programs, Martini was aware of her lack of formal experience in biology. “I wanted to choose a PhD program that would hold me accountable for learning more on the biology side,” explains Martini.

In describing the challenges of working as a classically trained physicist in biology, she believes that “it’s very important to try to meet people with different backgrounds and orientations into research questions halfway. If you don’t, you stand to miss out on the incredible wealth of knowledge that could be valuable in refining your own work.”

Meaning amidst chaos

Instead of taking a broad approach that attempts to incorporate all of the known information about early fruit fly development, Garcia favors a more minimalist perspective that begins with mathematical theory.

“Biology historically celebrates complexity, which can be difficult to reconcile with theory,” Garcia explains. “A key part of physics, on the other hand, involves developing theoretical models that are only as complex as necessary, and no more.” In this vein, biophysicists must balance the tension between models that are mathematically tractable and models that accurately explain reality.

Much of the scientific legwork has already been done. Transcription factors and genes can follow very similar rules to electrons and atoms, falling under a branch of physics known as statistical mechanics. This class of theories specializes in describing complex systems with many interacting components, eschewing microscopic explanations in favor of coarse-grained predictions of average behavior. With statistical mechanics, members of the Garcia lab generate simple predictions for various modes of gene regulation, forming theoretical building blocks that attempt to unify the complex milieu of embryonic development.

Of course, mathematical theories hold no water without the support of experimental validations. To test their theories, the Garcia lab subscribes to the idea that better data beats more data.

“Having a strong theoretical foundation allows you to be much more precise about how and why you measure things,” Garcia emphasizes. His lab focuses on precision measurements of specific genes and transcription factors. By labeling proteins and genes of interest with fluorescent proteins that light up under a microscope, they can visualize gene regulation events in living fruit fly embryos. These measurements provide a detailed description of a particular gene’s activity in space and time, allowing for accurate tests of the aforementioned theoretical predictions.

The Garcia lab has seen some preliminary successes. In a recent journal article preprint posted on bioRxiv, the lab investigated the applicability of an entire class of models based on statistical mechanics – known as thermodynamic models – in describing the activity of hunchback. After comparing key aspects of hunchback expression, such as the timing of when the gene turned on, they concluded that the experimentally observed features were inconsistent with the thermodynamic models’ predictions. By thus ruling out an entire class of models, they could then identify theoretical directions pointing to possible models that could work, paving the way for potential future applications, such as programming cells to achieve precise functions for bioengineering purposes in areas like agriculture or therapeutics.

Garcia is optimistic about the future of research in embryonic development. “There’s this idea that biology is too complicated to predictively understand, and that there’s something akin to magic in the ultimate functional complexity of biology,” he notes. “I think that’s a bit of a defeatist argument; with the right tools and the right mindsets, we can attack the question of complexity and start making sense of all the chaos.”

Jonathan Liu