My first thought was that they put some convolutional layers in to preprocess the images and then used the GPT architecture, but no, it’s literally just GPT again....
Does this maybe give us evidence the brain isn’t anywhere near a peak of generality, since we use specialised circuits for processing image data (which convolutional layers were based off of)
Not necessarily. There is no gene which hardcodes a convolutional kernel into the brain which we can look at and say, ‘ah yes, the brain is implementing a convolution, and nothing else’. Attention mechanisms for images learn convolution-like patterns (just more flexibly, and not pre-hardwired): to the extent that convolutions are powerful because they learn things like spatial locality (which is obviously true & useful), we would expect any more general learning algorithm to also learn similar patterns and look convolution-like. (This is a problem which goes back at least to, I think, Von Neumann: the fact that the brain is universal makes it very hard to tell what algorithm it actually runs, as opposed to what algorithms it has learned to run.)
My first thought was that they put some convolutional layers in to preprocess the images and then used the GPT architecture, but no, it’s literally just GPT again....
Does this maybe give us evidence the brain isn’t anywhere near a peak of generality, since we use specialised circuits for processing image data (which convolutional layers were based off of)
Not necessarily. There is no gene which hardcodes a convolutional kernel into the brain which we can look at and say, ‘ah yes, the brain is implementing a convolution, and nothing else’. Attention mechanisms for images learn convolution-like patterns (just more flexibly, and not pre-hardwired): to the extent that convolutions are powerful because they learn things like spatial locality (which is obviously true & useful), we would expect any more general learning algorithm to also learn similar patterns and look convolution-like. (This is a problem which goes back at least to, I think, Von Neumann: the fact that the brain is universal makes it very hard to tell what algorithm it actually runs, as opposed to what algorithms it has learned to run.)