A famous AI has learned a new trick: how to do chemistry

Artificial intelligence has changed the way science is practiced by allowing researchers to analyze the massive amounts of data generated by modern scientific instruments. He can find a needle in a million haystacks of information and, using deep learning, it can learn from the data itself. AI is accelerating progress in gene hunt, Medication, drug design and the creation of organic compounds.

Deep learning uses algorithms, often neural networks trained on large amounts of data, to extract insights from new data. It is very different from traditional computing with its step-by-step instructions. Rather, it learns from data. Deep learning is much less transparent than traditional computer programming, which leaves important questions: what has the system learned, what does it know?

Like a chemistry teacher I like to design tests that have at least one challenging question that stretches students’ knowledge to determine if they can combine different ideas and synthesize new ideas and concepts. We designed such a question for the poster of AI defenders, AlphaFold, which solved the problem protein folding problem.

Protein folding

Proteins are present in all living organisms. They structure cells, catalyze reactions, transport small molecules, digest food and much more. They are made up of long chains of amino acids like beads on a string. But for a protein to do its job in the cell, it has to twist and bend into a complex three-dimensional structure, a process called protein folding. Misfolded proteins can lead to disease.


Within milliseconds of a chain of amino acids (left) exiting the ribosome, it is folded into the lower-energy 3D shape (right) needed for the protein to function. Credit: Marc Zimmer, CC BY-ND

In his acceptance speech for the Nobel Prize in Chemistry in 1972, Christian Anfinsen postulated that it should be possible to calculate the three-dimensional structure of a protein from the sequence of its building blocksamino acids.

Just as the order and spacing of letters in this article gives it meaning and message, the order of amino acids determines the identity and shape of the protein, which translates to its function.

Due to the inherent flexibility of amino acid building blocks, a typical protein can adopt an estimate 10 to the power of 300 different shapes. That’s a huge number, more than the number of atoms in the universe. Yet within a millisecond, every protein in an organism will fold into its own specific shape – the most energetic arrangement of all the chemical bonds that make up the protein. Change just one amino acid of the hundreds of amino acids typically found in a protein and it can fold badly and stop working.

Alpha folding

For 50 years, computer scientists have tried to solve the problem of protein folding, with little success. Then in 2016 DeepMindan artificial intelligence subsidiary of Alphabet, the parent company of Google, has launched its Alpha folding program. He used the protein database like its training set, which contains the experimentally determined structures of more than 150,000 proteins.

In less than five years, AlphaFold had protein folding problem beat— at least the most useful part, namely the determination of the protein structure of his amino acid sequence. AlphaFold does not explain how proteins fold so quickly and precisely. This was a major victory for AI, because not only did it gain enormous scientific prestige, but it was also a major scientific breakthrough that could affect everyone’s life.

Today, thanks to programs like AlphaFold2 and RoseTTAFold, researchers like me can determine the three-dimensional structure of proteins from the sequence of amino acids that make up the protein, at no cost, in an hour or two. Before AlphaFold2, we had to crystallize proteins and solve structures using X-ray crystallographya process that took months and cost tens of thousands of dollars per structure.

We now also have access to the AlphaFold Protein Structure Database, where Deepmind has deposited the 3D structures of nearly every protein found in humans, mice, and over 20 other species. To date, they have solved over a million structures and plan to add another 100 million structures this year alone. Knowledge of proteins has exploded. The structure of half of all known proteins is expected to be documented by the end of 2022, including many unique new structures associated with useful new functions.

Think like a chemist

AlphaFold2 was not designed to predict how proteins would interact with each other, but it was able to model how individual proteins combine to form large complex units composed of several proteins. We had a tough question for AlphaFold: Did his structural training set teach him some chemistry? Could he tell if the amino acids would react with each other – a rare but important event?

Une célèbre IA a appris une nouvelle astuce : comment faire de la chimie

AlphaFold2 can take the amino acid sequence of fluorescent proteins (top letters) and predict their 3D barrel shapes (middle). It’s not surprising. What is totally unexpected is that it can also predict which fluorescent proteins are “broken” and cannot become fluorescent. Credit: Marc Zimmer, CC BY-ND

I am a computational chemist interested in fluorescent proteins. These are proteins present in hundreds of marine organisms such as jellyfish and corals. Their glow can be used to illuminate and study diseases.

There are 578 fluorescent proteins in the protein database, 10 of which are “broken” and do not fluoresce. Proteins rarely attack themselves, a process called autocatalytic post-translational modification, and it is very difficult to predict which proteins will react with themselves and which will not.

Only a chemist with a significant amount of knowledge about fluorescent proteins would be able to use the amino acid sequence to find fluorescent proteins that have the correct amino acid sequence to undergo the chemical transformations necessary to make them fluorescent. When we presented AlphaFold2 with the sequences of 44 fluorescent proteins that are not in the protein database, it folded the attached fluorescent proteins differently than the broken ones.

The result stunned us: AlphaFold2 had learned a bit of chemistry. He had discovered which amino acids in fluorescent proteins do the chemistry that makes them shine. We suspect that the Protein Data Bank training set and multiple sequence alignments allow AlphaFold2 to “think” like chemists and research amino acids necessary to react with each other to make the protein fluorescent.

A program folding learning chemistry from its training set also has wider implications. By asking the right questions, what else can we gain from others? deep learning algorithms? Could facial recognition algorithms find hidden disease markers? Could algorithms designed to predict consumer spending habits also find a propensity for petty theft or deception? And the most important thing is this ability – and similar jumps in capacity in other AI systems – desirable?

Provided by
The conversation

This article is republished from The conversation under Creative Commons license. Read it original article.The conversation

Quote: A Famous AI Learned a New Trick: How to Do Chemistry (Jun 17, 2022) Retrieved Jun 17, 2022 from https://phys.org/news/2022-06-celebrated-ai-chemistry.html

This document is subject to copyright. Except for fair use for purposes of private study or research, no part may be reproduced without written permission. The content is provided for information only.

#famous #learned #trick #chemistry

Leave a Comment

Your email address will not be published. Required fields are marked *