Sorry, you’re completely right about the first point. I’ll correct the original comment.
Re: learning weird stuff, they definitely do, but a lot of contemporary weirdness feels very data dependent (e.g. I failed to realize my data was on a human-recognizably weird submanifold, like medical images from different hospitals with different patient populations) versus grokking-dependent (e.g. AlphaFold possibly figuring out new predictive principles underlying protein folding, or a hypothetical future model thinking about math textbooks for long enough that it solves a Millenium Prize problem).
EDIT: though actually AlphaFold might be a bad example, because it got to simulate a shit-ton of data, so maybe I’ll just stick to the “deep grokking of math” hypothetical.
Sorry, you’re completely right about the first point. I’ll correct the original comment.
Re: learning weird stuff, they definitely do, but a lot of contemporary weirdness feels very data dependent (e.g. I failed to realize my data was on a human-recognizably weird submanifold, like medical images from different hospitals with different patient populations) versus grokking-dependent (e.g. AlphaFold possibly figuring out new predictive principles underlying protein folding, or a hypothetical future model thinking about math textbooks for long enough that it solves a Millenium Prize problem).
EDIT: though actually AlphaFold might be a bad example, because it got to simulate a shit-ton of data, so maybe I’ll just stick to the “deep grokking of math” hypothetical.