Yeah, I agree that there’s probably a bias towards biomolecules right now, but I think that’s relatively easily fixable by using the naive way to predict the amino acid sequence of a structure we want, then actually making that protein, then checking its structure with crystallography and retraining AlphaFold to predict the right structure. If we do this procedure with sequences that differ more and more from biomolecules, we’ll slowly remove that bias from AlphaFold.
By “bias” I didn’t mean biases in the learned model, I meant “the class of proteins whose structures can be predicted by ML algorithms at all is biased towards biomolecules”. What you’re suggesting is still within the local search paradigm, which might not be sufficient for the protein folding problem in general, any more than it is sufficient for 3-SAT in general. No sampling is dense enough if large swaths of the problem space is discontinuous.
Yeah, I agree that there’s probably a bias towards biomolecules right now, but I think that’s relatively easily fixable by using the naive way to predict the amino acid sequence of a structure we want, then actually making that protein, then checking its structure with crystallography and retraining AlphaFold to predict the right structure. If we do this procedure with sequences that differ more and more from biomolecules, we’ll slowly remove that bias from AlphaFold.
By “bias” I didn’t mean biases in the learned model, I meant “the class of proteins whose structures can be predicted by ML algorithms at all is biased towards biomolecules”. What you’re suggesting is still within the local search paradigm, which might not be sufficient for the protein folding problem in general, any more than it is sufficient for 3-SAT in general. No sampling is dense enough if large swaths of the problem space is discontinuous.