DNA and RNA have been compared to “instruction manuals” containing the information needed for living “machines” to operate. But while electronic machines like computers and robots are designed from the ground up to serve a specific purpose, biological organisms are governed by a much messier, more complex set of functions that lack the predictability of binary code. Inventing new solutions to biological problems requires teasing apart seemingly intractable variables — a task that is daunting to even the most intrepid human brains.
Two teams of scientists from the Wyss Institute at Harvard University and the Massachusetts Institute of Technology have devised pathways around this roadblock by going beyond human brains; they developed a set of machine learning algorithms that can analyze reams of RNA-based “toehold” sequences and predict which ones will be most effective at sensing and responding to a desired target sequence. As reported in two papers published concurrently today in Nature Communications, the algorithms could be generalizable to other problems in synthetic biology as well, and could accelerate the development of biotechnology tools to improve science and medicine and help save lives.
“These achievements are exciting because they mark the starting point of our ability to ask better questions about the fundamental principles of RNA folding, which we need to know in order to achieve meaningful discoveries and build useful biological technologies,” said Luis Soenksen, a postdoctoral fellow at the Wyss Institute and Venture Builder at MIT’s Jameel Clinic who is a co-first author of the first of the two papers.
Getting ahold of toehold switches
The collaboration between data scientists from the Wyss Institute’s Predictive BioAnalytics Initiative and synthetic biologists in Wyss core faculty member Jim Collins’ lab at MIT was created to apply the computational power of machine learning, neural networks, and other algorithmic architectures to complex problems in biology that have so far defied resolution.
As a proving ground for their approach, the two teams focused on a specific class of engineered RNA molecules: toehold switches, which are folded into a hairpin-like shape in their “off” state. When a complementary RNA strand binds to a “trigger” sequence trailing from one end of the hairpin, the toehold switch unfolds into its “on” state and exposes sequences that were previously hidden within the hairpin, allowing ribosomes to bind to and translate a downstream gene into protein molecules. This precise control over the expression of genes in response to the presence of a given molecule makes toehold switches very powerful components for sensing substances in the environment, detecting disease, and other purposes.
However, many toehold switches do not work very well when tested experimentally, even though they have been engineered to produce a desired output in response to a given input based on known RNA folding rules. Recognizing this problem, the teams decided to use machine learning to analyze a large volume of toehold switch sequences and use insights from that analysis to more accurately predict which toeholds reliably perform their intended tasks, which would allow researchers to quickly identify high-quality toeholds for various experiments.