There is also a hazy counting argument for overfitting:
It seems like there are “lots of ways” that a model could end up massively overfitting and still get high training performance.
So absent some additional story about why training won’t select an overfitter, it feels like the possibility should be getting substantive weight.
While many machine learning researchers have felt the intuitive pull of this hazy overfitting argument over the years, we now have a mountain of empirical evidence that its conclusion is false. Deep learning is strongly biased toward networks that generalize the way humans want— otherwise, it wouldn’t be economically useful.
I don’t know well NN history, but I have the impression good NN training is not trivial. I expect that the first n attempts at NN training went bad in some way, including overfitting. So, without already knowing how to train an NN without overfitting, you’d get some overfitting in your experiments. The fact that now, after someone already poured their brain juice over finding techniques that avoid the problem, you don’t get overfitting, is not evidence that you shouldn’t have expected overfitting before.
The analogy with AI scheming is: you don’t already know the techniques to avoid scheming. You can’t use as counterargument a case in which a problem has already deliberately been solved. If you take that same case, and put yourself in the shoes of someone who doesn’t already have the solution, you see you’ll get the problem in your face a few times before solving it.
Then, it is a matter of whether it works like Yudkowsky says, that you may only get one chance to solve it.
The title says “no evidence for AI doom in counting arguments”, but the article mostly talks about neural networks (not AI in general), and the conclusion is
In this essay, we surveyed the main arguments that have been put forward for thinking that future AIs will scheme against humans by default. We find all of them seriously lacking. We therefore conclude that we should assign very low credence to the spontaneous emergence of scheming in future AI systems— perhaps 0.1% or less.
“main arguments”: I don’t think counting arguments completely fill up this category. Example: the concept of scheming originates from observing it in humans.
Overall, I have the impression of some overstatement. It can also be that I’m missing some previous discussion context/assumptions, so other background theory from you may say “humans don’t matter as examples”, and also “AI will be NNs and not other things”.
I don’t know well NN history, but I have the impression good NN training is not trivial. I expect that the first n attempts at NN training went bad in some way, including overfitting. So, without already knowing how to train an NN without overfitting, you’d get some overfitting in your experiments. The fact that now, after someone already poured their brain juice over finding techniques that avoid the problem, you don’t get overfitting, is not evidence that you shouldn’t have expected overfitting before.
The analogy with AI scheming is: you don’t already know the techniques to avoid scheming. You can’t use as counterargument a case in which a problem has already deliberately been solved. If you take that same case, and put yourself in the shoes of someone who doesn’t already have the solution, you see you’ll get the problem in your face a few times before solving it.
Then, it is a matter of whether it works like Yudkowsky says, that you may only get one chance to solve it.
The title says “no evidence for AI doom in counting arguments”, but the article mostly talks about neural networks (not AI in general), and the conclusion is
“main arguments”: I don’t think counting arguments completely fill up this category. Example: the concept of scheming originates from observing it in humans.
Overall, I have the impression of some overstatement. It can also be that I’m missing some previous discussion context/assumptions, so other background theory from you may say “humans don’t matter as examples”, and also “AI will be NNs and not other things”.