I don’t know what this was a reference to, but amusingly I just noticed that the video I wanted to link was a 2007 lecture at Google by him (if it’s the same Geoffrey Hinton): https://www.youtube.com/watch?v=AyzOUbkUf3M
In it he explained a novel approach to handwriting recognition: stack a bunch of increasingly small layers on top of each other until you have just a few dozen neurons, then an inverted pyramid on top of this bottleneck, and train the network by feeding it a lot of handwritten characters using some sort of modified gradient descent to train it to reproduce the input image in the topmost layer as accurately as possible. After the network is trained, use supervised learning with labeled data to train a usual small NN to interpret/map the bottleneck layer activations to characters. And it worked!
I find it interesting, especially in context of your comment, because:
Unsupervised learning meant that you could feed it a LOT of data.
So you could force it to learn patterns in the data without overfitting.
It appeared uninspired by any natural neuronal organization.
It was a precursor to Deep Dream—you could run the network in reverse and see what it imagines when prompted with a specific digit.
It actually worked! and basically solved handwriting recognition, as far as I understand.
And so it felt like a first qualitative leap in technology in decades, and a very impressive at that, innovating in weird and unexpected ways in several aspects. Sure, it would be another ten years until GPT2, but some promise was definitely there I think.
I don’t know what this was a reference to, but amusingly I just noticed that the video I wanted to link was a 2007 lecture at Google by him (if it’s the same Geoffrey Hinton): https://www.youtube.com/watch?v=AyzOUbkUf3M
In it he explained a novel approach to handwriting recognition: stack a bunch of increasingly small layers on top of each other until you have just a few dozen neurons, then an inverted pyramid on top of this bottleneck, and train the network by feeding it a lot of handwritten characters using some sort of modified gradient descent to train it to reproduce the input image in the topmost layer as accurately as possible. After the network is trained, use supervised learning with labeled data to train a usual small NN to interpret/map the bottleneck layer activations to characters. And it worked!
I find it interesting, especially in context of your comment, because:
Unsupervised learning meant that you could feed it a LOT of data.
So you could force it to learn patterns in the data without overfitting.
It appeared uninspired by any natural neuronal organization.
It was a precursor to Deep Dream—you could run the network in reverse and see what it imagines when prompted with a specific digit.
It actually worked! and basically solved handwriting recognition, as far as I understand.
And so it felt like a first qualitative leap in technology in decades, and a very impressive at that, innovating in weird and unexpected ways in several aspects. Sure, it would be another ten years until GPT2, but some promise was definitely there I think.
I guess the joke is not as well-known as I thought: https://twitter.com/pmddomingos/status/632685510201241600 (There’s a better page of Hinton stories somewhere but I can’t immediately refind it.)