After an exchange with Ryan, I see that I could’ve stated my point a bit clearer. It’s something more like “the algorithms that the current SOTA AIs execute during their forward passes do not necessarily capture all the core dynamics that would happen within an AGI’s cognition, so extrapolating the limitations of their cognition to AGI is a bold claim we have little evidence for”.
So, yes, studying weaker AIs sheds some light on stronger ones (that’s why there’s “nearly” in “nearly no data”), so studying CNNs in order to learn about LLMs before LLMs exist isn’t totally pointless. But the lessons you learn would be more about “how to do interpretability on NN-style architectures” and “what’s the SGD’s biases?” and “how precisely does matrix multiplication implement algorithms?” and so on.
Not “what precise algorithms does a LLM implement?”.
the algorithms that the current SOTA AIs execute during their forward passes do not necessarily capture all the core dynamics that would happen within an actual AGI’s cognition, so extrapolating the limitations of their cognition to future AGI is a bold claim we have little evidence for
I suggest putting this at the top as a tl;dr (with the additions I bolded to make your point more clear)
After an exchange with Ryan, I see that I could’ve stated my point a bit clearer. It’s something more like “the algorithms that the current SOTA AIs execute during their forward passes do not necessarily capture all the core dynamics that would happen within an AGI’s cognition, so extrapolating the limitations of their cognition to AGI is a bold claim we have little evidence for”.
So, yes, studying weaker AIs sheds some light on stronger ones (that’s why there’s “nearly” in “nearly no data”), so studying CNNs in order to learn about LLMs before LLMs exist isn’t totally pointless. But the lessons you learn would be more about “how to do interpretability on NN-style architectures” and “what’s the SGD’s biases?” and “how precisely does matrix multiplication implement algorithms?” and so on.
Not “what precise algorithms does a LLM implement?”.
I suggest putting this at the top as a tl;dr (with the additions I bolded to make your point more clear)