Prediction: this won’t make much difference for either biology or medicine in general. The one big thing it will do is cause funding agencies to stop wasting so much money on protein structure studies (assuming that AlphaFold’s results generalize beyond this particular challenge, which I’m uncertain about). The whole field of structural biology was 95% useless anyway.
It is an interesting result from the AI angle, though.
Do you have a simple explanation of why you consider structural biology useless? My outside view impression was that protein shape and folding was really important to understanding how to work. Isn’t that useful in practice?
We mainly want to know (a) what reactions a protein is involved in, and (b) the rate constants on those reactions. In practice, protein shape tells us very little about either of those without extensive additional simulation. (It can give some hints as to what broad classes of reaction the protein might be involved in, but my understanding is that we can get most of those same hints from the sequence alone.)
In principle, folded protein structures could be used as an input to those sorts of simulations, but the simulation is expensive in much the same way as the folding problem itself, and as far as I know the cutting edge in simulation still can’t provide precision or speed comparable to high-throughput assays (even given folded structures).
In gears terms: everything we care about in a high-dimensional protein structure is summarized by low-dimensional reaction rates, so proteins make really good gears. A practical consequence is that directly measuring reaction rates is way more efficient than simulating all the low-level activity. There are things that approach can’t handle—e.g. we don’t know how a change to the protein will change reaction rates—but even with protein folding “solved”, simulation isn’t at the point where it can make those predictions faster and more precisely than a new experiment.
Does knowing the structure of a protein help with simulating how it responds to any arbitrary/unknown protein/molecule/agonist/antagonist/superagonist? [it seems that even with all the protein structures that we do know well, that finding appropriate agonists of the protein with the desired action is still a huge unsolved problem]. Is simulation a much more difficult problem than “folding”?
This allows us to design “efficient” proteins (proteins designed “intelligently” often do tend to be smaller, less “messy” and “bulky” than naturally-evolved proteins [which also cross over at the most pedagogically unhelpful sites ever], and with protein folding solved, it may be easier for us to design proteins that are less complicated/more amenable to simulation than the natural set of proteins that exist ⇒ not to mention that it may be possible to find a specific transferase protein that is able to precisely add a methyl or carboxyl group to any molecule at any location, or a ligase that is able to split a molecule at any arbitrary location). We may also be able to design them based on properties like how easy it is to introduce them into the cell via mRNA (the genes for many natural proteins are not easy to introduce into the cell via CRISPR or AAV, but as protein design-space is so large, you can probably design another protein that carries out the same function that can be delivered into cells via mRNA or CMV-based vectors, without needing to force the corresponding gene at the right location at the cell’s nucleus).
Anyhow, designing proteins for industrial chemistry (eg properly degrade polyethylene plastics in the ocean) [and also those with a specific physical property rather than those that perform a very specific function] is a much easier problem than, say, figuring out how to make an extremely particular histone acetyltransferase or DNA methyltransferase or chaperone enzyme [often those at the center of hub networks and whose evolved messiness naturally evolves due to the necessity of needing to have other extremely precise interactions with other proteins that have also evolved to become messy bloated behemoths] localize/diffuse at the locations where it can precisely do the right things at {X} sites and not do the wrong things at the {Y} other sites.
Also, this helps us develop a “periodic table of protein function” where you can design proteins that can carry out X function if you change certain motifs to it, and it will turn out as much cleaner/more organizeable/more predictable than the natural super-messy [and hard to organize] set of protein motifs we find in the wild. I think this is especially relevant for manufacturing and industrial chemistry—proteins that broadly carry out functions sort of similar to zymogen.
The whole field of structural biology was 95% useless anyway.
As long as it produces machine-interpretable output, it’s useful for training new algorithms, even if the vast majority of humans are unable to properly interpret protein structure.
^Anyhow, this post was replying to the idealized version. Protein folding is still far from solved, as https://twitter.com/mctucsf/status/1333447404910112768 explains. It’s an exciting advance to be sure. I think this allows us to better figure out what a stable system of ultrastructural scaffolds is first before figuring out what precise things can be built USING those ultrastructural scaffolds.
I disagree with your assessment that structural biology is useless. Knowing the shape of a protein can be pretty important if you want to perturb the protein’s function by, say, finding or creating a small molecule that binds to it. Crystal structures or cryo-EM structures can shed a lot of light on how a molecule binds to its target, which in turn can suggest further modifications to try and make a tighter binder. It’s not clear to me yet how easy or hard it will be to simulate ligand-protein binding using AlphaFold. I’d lean toward ‘hard’ but maybe molecular dynamics simulations would dovetail well with a structure determined by AlphaFold.
If you have a protein, and you know it’s designed to bind to something, but you don’t know to what, then maybe running a lot of imprecise simulations (using it’s folded structure) will allow you to narrow down the list of candidates, and thereby significantly save the time and cost of experiments?
That is the dream. The reality is harder, and the combinatorics are not friendly.
In practice, trying to “catch 2 proteins hanging out together” has usually been easier.
The main way we actually check to see if 2 proteins are interacting is… well, this metaphor is fun.
We try to work out which proteins are a couple, by trying to catch the proteins holding hands at the school dance. Either by freezing them, or sticking glue on their hands.
Sometimes even dragging one of them out of the school dance, and then checking to see if the other one tagged along.
Or if you already have a pretty good guess, try just grounding one of them and see if the other one starts acting weird.
I guess this turns the simulation method into “computer-modeling which people are likely to end up in a relationship together” which… seems to capture some of the right intuitions for how hard it is, and how much knowing “they were present in the same place at the same time” matters (whether they had an opportunity to meet in a cell type & cell compartment; something protein-shape doesn’t tell you). Watching for hand-holding has typically been easier.
Un-metaphoring: there’s multiple variants of this broad class of technique, and there’s even a variant of it for DNA-DNA, DNA-protein, or RNA-protein interactions.
Here’s some slightly-de-metaphored executions:
Glue: A chimeric-protein with a sticky-end (and then isolating one of the proteins in a binding column, and checking what else tagged along).
Freeze: Chemicals that halt cellular processes and cause semi-random-binding (ideally reversible) of things that happen to be next to each other whenever you took the freeze-frame.
Grounding: Here that means either altering, removing, or silencing one protein, to see how it affects the behavior of another.
And of course, whenever you do this, you still have to do: isolating, sequencing, and identifying the batch of proteins you’ve nabbed.
Yes, I do know the physics involved on some level, and some about the computational methods.
I think that, if deep learning can predict protein folding then it should eventually be able to predict protein binding as well, since most of the physics is the same: it’s just amino acids on two different peptide chains interacting, instead of amino acids on the same chain.
On the other hand, predicting which reaction an enzyme catalyzes involves more physics, so it could be much harder: but then again, maybe it isn’t. Or maybe we can at least predict with which biomolecules a given protein is likely to react and do experimental work to find out the details.
I find it hard to believe your prediction that this breakthrough will be insignificant given what I’ve read in other reputable sources. I give a pretty high initial credence to the scientific claims of publications like Nature which had this to say in their article on Alphafold2:
The ability to accurately predict protein structures from their amino-acid sequence would be a huge boon to life sciences and medicine. It would vastly accelerate efforts to understand the building blocks of cells and enable quicker and more advanced drug discovery.
Prediction: this won’t make much difference for either biology or medicine in general. The one big thing it will do is cause funding agencies to stop wasting so much money on protein structure studies (assuming that AlphaFold’s results generalize beyond this particular challenge, which I’m uncertain about). The whole field of structural biology was 95% useless anyway.
It is an interesting result from the AI angle, though.
That’s an interesting take.
Do you have a simple explanation of why you consider structural biology useless? My outside view impression was that protein shape and folding was really important to understanding how to work. Isn’t that useful in practice?
We mainly want to know (a) what reactions a protein is involved in, and (b) the rate constants on those reactions. In practice, protein shape tells us very little about either of those without extensive additional simulation. (It can give some hints as to what broad classes of reaction the protein might be involved in, but my understanding is that we can get most of those same hints from the sequence alone.)
In principle, folded protein structures could be used as an input to those sorts of simulations, but the simulation is expensive in much the same way as the folding problem itself, and as far as I know the cutting edge in simulation still can’t provide precision or speed comparable to high-throughput assays (even given folded structures).
In gears terms: everything we care about in a high-dimensional protein structure is summarized by low-dimensional reaction rates, so proteins make really good gears. A practical consequence is that directly measuring reaction rates is way more efficient than simulating all the low-level activity. There are things that approach can’t handle—e.g. we don’t know how a change to the protein will change reaction rates—but even with protein folding “solved”, simulation isn’t at the point where it can make those predictions faster and more precisely than a new experiment.
Does knowing the structure of a protein help with simulating how it responds to any arbitrary/unknown protein/molecule/agonist/antagonist/superagonist? [it seems that even with all the protein structures that we do know well, that finding appropriate agonists of the protein with the desired action is still a huge unsolved problem]. Is simulation a much more difficult problem than “folding”?
This allows us to design “efficient” proteins (proteins designed “intelligently” often do tend to be smaller, less “messy” and “bulky” than naturally-evolved proteins [which also cross over at the most pedagogically unhelpful sites ever], and with protein folding solved, it may be easier for us to design proteins that are less complicated/more amenable to simulation than the natural set of proteins that exist ⇒ not to mention that it may be possible to find a specific transferase protein that is able to precisely add a methyl or carboxyl group to any molecule at any location, or a ligase that is able to split a molecule at any arbitrary location). We may also be able to design them based on properties like how easy it is to introduce them into the cell via mRNA (the genes for many natural proteins are not easy to introduce into the cell via CRISPR or AAV, but as protein design-space is so large, you can probably design another protein that carries out the same function that can be delivered into cells via mRNA or CMV-based vectors, without needing to force the corresponding gene at the right location at the cell’s nucleus).
Anyhow, designing proteins for industrial chemistry (eg properly degrade polyethylene plastics in the ocean) [and also those with a specific physical property rather than those that perform a very specific function] is a much easier problem than, say, figuring out how to make an extremely particular histone acetyltransferase or DNA methyltransferase or chaperone enzyme [often those at the center of hub networks and whose evolved messiness naturally evolves due to the necessity of needing to have other extremely precise interactions with other proteins that have also evolved to become messy bloated behemoths] localize/diffuse at the locations where it can precisely do the right things at {X} sites and not do the wrong things at the {Y} other sites.
Also, this helps us develop a “periodic table of protein function” where you can design proteins that can carry out X function if you change certain motifs to it, and it will turn out as much cleaner/more organizeable/more predictable than the natural super-messy [and hard to organize] set of protein motifs we find in the wild. I think this is especially relevant for manufacturing and industrial chemistry—proteins that broadly carry out functions sort of similar to zymogen.
As long as it produces machine-interpretable output, it’s useful for training new algorithms, even if the vast majority of humans are unable to properly interpret protein structure.
^Anyhow, this post was replying to the idealized version. Protein folding is still far from solved, as https://twitter.com/mctucsf/status/1333447404910112768 explains. It’s an exciting advance to be sure. I think this allows us to better figure out what a stable system of ultrastructural scaffolds is first before figuring out what precise things can be built USING those ultrastructural scaffolds.
I disagree with your assessment that structural biology is useless. Knowing the shape of a protein can be pretty important if you want to perturb the protein’s function by, say, finding or creating a small molecule that binds to it. Crystal structures or cryo-EM structures can shed a lot of light on how a molecule binds to its target, which in turn can suggest further modifications to try and make a tighter binder. It’s not clear to me yet how easy or hard it will be to simulate ligand-protein binding using AlphaFold. I’d lean toward ‘hard’ but maybe molecular dynamics simulations would dovetail well with a structure determined by AlphaFold.
If you have a protein, and you know it’s designed to bind to something, but you don’t know to what, then maybe running a lot of imprecise simulations (using it’s folded structure) will allow you to narrow down the list of candidates, and thereby significantly save the time and cost of experiments?
(Not an expert, just guessing)
That is the dream. The reality is harder, and the combinatorics are not friendly.
In practice, trying to “catch 2 proteins hanging out together” has usually been easier.
The main way we actually check to see if 2 proteins are interacting is… well, this metaphor is fun.
We try to work out which proteins are a couple, by trying to catch the proteins holding hands at the school dance. Either by freezing them, or sticking glue on their hands.
Sometimes even dragging one of them out of the school dance, and then checking to see if the other one tagged along.
Or if you already have a pretty good guess, try just grounding one of them and see if the other one starts acting weird.
I guess this turns the simulation method into “computer-modeling which people are likely to end up in a relationship together” which… seems to capture some of the right intuitions for how hard it is, and how much knowing “they were present in the same place at the same time” matters (whether they had an opportunity to meet in a cell type & cell compartment; something protein-shape doesn’t tell you). Watching for hand-holding has typically been easier.
Un-metaphoring: there’s multiple variants of this broad class of technique, and there’s even a variant of it for DNA-DNA, DNA-protein, or RNA-protein interactions.
Here’s some slightly-de-metaphored executions:
Glue: A chimeric-protein with a sticky-end (and then isolating one of the proteins in a binding column, and checking what else tagged along).
Freeze: Chemicals that halt cellular processes and cause semi-random-binding (ideally reversible) of things that happen to be next to each other whenever you took the freeze-frame.
Grounding: Here that means either altering, removing, or silencing one protein, to see how it affects the behavior of another.
And of course, whenever you do this, you still have to do: isolating, sequencing, and identifying the batch of proteins you’ve nabbed.
Yes, I do know the physics involved on some level, and some about the computational methods.
I think that, if deep learning can predict protein folding then it should eventually be able to predict protein binding as well, since most of the physics is the same: it’s just amino acids on two different peptide chains interacting, instead of amino acids on the same chain.
On the other hand, predicting which reaction an enzyme catalyzes involves more physics, so it could be much harder: but then again, maybe it isn’t. Or maybe we can at least predict with which biomolecules a given protein is likely to react and do experimental work to find out the details.
That’s the dream.
I find it hard to believe your prediction that this breakthrough will be insignificant given what I’ve read in other reputable sources. I give a pretty high initial credence to the scientific claims of publications like Nature which had this to say in their article on Alphafold2:
reference