Architecture: neural nets are a family of architectures. Transformers are an example of a specific architecture.
Meta-algorithm: the arrangement of and training procedure for the stack of transforms; this seems to be what people are referring to when they say algorithmic improvements. I think of it this way because it is an algorithm-finding-algorithm.
Algorithm: the learned algorithm found by the meta-algorithm, being used to handle inputs in production. There is high uncertainty about what exactly these are, since we can only easily see the weights/activations.
Algorithmic uncertainty: by default we don’t know what the algorithm is in any detail. Mechanistic Interpretability is about reducing algorithmic uncertainty.
Scale compute. This means stuff like GPUs, TPUs, total number of flops, etc.
Scale data. Normally this means unstructured; I don’t actually have any sense of what scaling structured data looks like other than it really seemed to help AlphaFold.
Scale models. This means increasing the number of parameters which are trained, which seems to correlate with compute but strictly speaking should be independent of it.
But none of the conversations seem to match these divisions, which means at least I am out of sync with everyone else, and probably that I am wrong in several dimensions.
I’m confused about algorithms vs scaling conversations. I’ve been noodling around this question space since reading Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? a couple weeks ago, and the more I think about predictions in terms of algorithm and scale the more complete my confusion.
I currently have it head-chunked this way:
Architecture: neural nets are a family of architectures. Transformers are an example of a specific architecture.
Meta-algorithm: the arrangement of and training procedure for the stack of transforms; this seems to be what people are referring to when they say algorithmic improvements. I think of it this way because it is an algorithm-finding-algorithm.
Algorithm: the learned algorithm found by the meta-algorithm, being used to handle inputs in production. There is high uncertainty about what exactly these are, since we can only easily see the weights/activations.
Algorithmic uncertainty: by default we don’t know what the algorithm is in any detail. Mechanistic Interpretability is about reducing algorithmic uncertainty.
Scale compute. This means stuff like GPUs, TPUs, total number of flops, etc.
Scale data. Normally this means unstructured; I don’t actually have any sense of what scaling structured data looks like other than it really seemed to help AlphaFold.
Scale models. This means increasing the number of parameters which are trained, which seems to correlate with compute but strictly speaking should be independent of it.
But none of the conversations seem to match these divisions, which means at least I am out of sync with everyone else, and probably that I am wrong in several dimensions.