Very grim. I think that almost everybody is bouncing off the real hard problems at the center and doing work that is predictably not going to be useful at the superintelligent level, nor does it teach me anything I could not have said in advance of the paper being written. People like to do projects that they know will succeed and will result in a publishable paper, and that rules out all real research at step 1 of the social process.
This is an interesting critique, but it feels off to me. There’s actually a lot of ‘gap’ between the neat theory explanation of something in a paper and actually building it. I can imagine many papers where I might say:
“Oh, I can predict in advance what will happen if you build this system with 80% confidence.”
But if you just kinda like, keep recursing on that:
“I can imagine what will happen if you build the n+1 version of this system with 79% confidence...”
“I can imagine what will happen if you build the n+2 version of this system with 76% confidence...”
“I can imagine what will happen if you build the n+3 version of this system with 74% confidence...”
It’s not so much that my confidence starts dropping (though it does), as that you are beginning to talk about a fairly long lead time in practical development work.
As anyone who has worked with ML knows, it takes a long time to get a functioning code base with all the kinks ironed out and methods that do the things they theoretically should do. So I could imagine a lot of AI safety papers with results that are, fundamentally, completely predictable, but a built system implementing them is still very useful to build up your implementing-AI-safety muscles.
I’m also concerned that you admit you have no theoretical angle of attack on alignment, but seem to see empirical work as hopeless. AI is full of theory developed as post-hoc justification of what starts out as empirical observation. To quote an anonymous person who is familiar with the history of AI research:
REDACTED
—
Today at 5:33 PM
Yeah. This is one thing that soured me on Schmidhuber. I realized that what he is doing is manufacturing history.
Creating an alternate reality/narrative where DL work flows from point A to point B to point C every few years, when in fact, B had no idea about A, and C was just tinkering with A.
Academic pedigrees reward post hoc propter ergo hoc on a mass scale.
And of course, post-alphago, I find this intellectual forging to be not just merely annoying and bad epistemic practice, but a serious contribution to X-Risk.
By falsifying how progress actually happened, it prioritizes any kind of theoretical work, downplaying empirical work, implementation, trial-and-error, and the preeminent role of compute.
In Schmidhuber’s history, everyone knows all about DL and meta-learning, and DL history is a grand triumphant march from the perceptron to the neocognitron to Schmidhuber’s LSTM to GPT-3 as a minor uninteresting extension of his fast memory work, all unfolding exactly as seen.
As opposed to what actually happened which was a bunch of apes poking in the mud drawing symbols grunting to each other until a big monolith containing a thousand GPUs appeared out of nowhere, the monkeys punched the keyboard a few times, and bow in awe.
And then going back and saying ‘ah yes, Grog foresaw the monolith when he smashed his fist into the mud and made a vague rectangular shape’.
My usual example is ResNets. Super important, one of the most important discoveries in DL...and if you didn’t read a bullshit PR interview MS PR put out in 2016 or something where they admit it was simply trying out random archs until it worked, all you have is the paper placidly explaining “obviously resnets are a good idea because they make the gradients flow and can be initialized to the identity transformation; in accordance with our theory, we implemented and trained a resnet cnn on imagenet...”
Discouraging the processes by which serendipity can occur when you have no theoretical angle of attack seems suicidal to me, to put it bluntly. While I’m quite certain there is a large amount of junk work on AI safety, we would likely do well to put together some kind of process where more empirical approaches are taken faster with more opportunities for ‘a miracle’ as you termed it to arise.
[I am a total noob on history of deep learning & AI]
From a cursory glance I find Schmidhuber’s take convincing.
He argues that the (vast) majority of conceptual & theoretical advances in deep learning have been understood decades before—often by Schmidhuber and his collaborators.
It is unfortunate that the above poster is anonymous. It is very clear to me that there is a big difference between theoretical & conceptual advances and the great recent practical advances due to stacking MOAR layers.
It is possible that remaining steps to AGI consists of just stacking MOAR layers: compute + data + comparatively small advances in data/compute efficiency + something something RL Metalearning will produce an AGI. Certainly, not all problems can be solved [fast] by incremental advances and/or iterating on previous attempts. Some can. It may be the unfortunate reality that creating [but not understanding!] AGI is one of them.
This is an interesting critique, but it feels off to me. There’s actually a lot of ‘gap’ between the neat theory explanation of something in a paper and actually building it. I can imagine many papers where I might say:
“Oh, I can predict in advance what will happen if you build this system with 80% confidence.”
But if you just kinda like, keep recursing on that:
“I can imagine what will happen if you build the n+1 version of this system with 79% confidence...”
“I can imagine what will happen if you build the n+2 version of this system with 76% confidence...”
“I can imagine what will happen if you build the n+3 version of this system with 74% confidence...”
It’s not so much that my confidence starts dropping (though it does), as that you are beginning to talk about a fairly long lead time in practical development work.
As anyone who has worked with ML knows, it takes a long time to get a functioning code base with all the kinks ironed out and methods that do the things they theoretically should do. So I could imagine a lot of AI safety papers with results that are, fundamentally, completely predictable, but a built system implementing them is still very useful to build up your implementing-AI-safety muscles.
I’m also concerned that you admit you have no theoretical angle of attack on alignment, but seem to see empirical work as hopeless. AI is full of theory developed as post-hoc justification of what starts out as empirical observation. To quote an anonymous person who is familiar with the history of AI research:
Discouraging the processes by which serendipity can occur when you have no theoretical angle of attack seems suicidal to me, to put it bluntly. While I’m quite certain there is a large amount of junk work on AI safety, we would likely do well to put together some kind of process where more empirical approaches are taken faster with more opportunities for ‘a miracle’ as you termed it to arise.
[I am a total noob on history of deep learning & AI]
From a cursory glance I find Schmidhuber’s take convincing.
He argues that the (vast) majority of conceptual & theoretical advances in deep learning have been understood decades before—often by Schmidhuber and his collaborators.
Moreover, he argues that many of the current leaders in the field improperly credit previous discoveries
It is unfortunate that the above poster is anonymous. It is very clear to me that there is a big difference between theoretical & conceptual advances and the great recent practical advances due to stacking MOAR layers.
It is possible that remaining steps to AGI consists of just stacking MOAR layers: compute + data + comparatively small advances in data/compute efficiency + something something RL Metalearning will produce an AGI. Certainly, not all problems can be solved [fast] by incremental advances and/or iterating on previous attempts. Some can. It may be the unfortunate reality that creating [but not understanding!] AGI is one of them.