One important factor seems to be that Eliezer often imagines scenarios in which AI systems avoid making major technical contributions, or revealing the extent of their capabilities, because they are lying in wait to cause trouble later. But if we are constantly training AI systems to do things that look impressive, then SGD will be aggressively selecting against any AI systems who don’t do impressive-looking stuff. So by the time we have AI systems who can develop molecular nanotech, we will definitely have had systems that did something slightly-less-impressive-looking.
This objection only holds if you imagine that AI systems are acquiring knowledge about the world / capability only through gradient decent, as opposed to training an agent that learns, and becomes more capable thereby, over runtime.
I would say: this objection holds for all AI designs that are being seriously considered to date. I agree it doesn’t apply in full generality to “not-modern-ML.” That said, you can use gradient descent to build agents that accumulate and manipulate knowledge (e.g. by reading and writing to databases, either in natural language or in opaque neuralese) and my argument applies just as well to those systems. I think you are imagining something more precise.
I do agree that once you say “the techniques will just be totally different from ML” then I get more into “all bets are off,” and maybe then I could end up with a 50-50 chance of AI systems concealing their capabilities (though that still seems high). That said, I think you shouldn’t be confident of “totally different from ML” at this point:
I think you have at least a reasonable probability on “modern ML leads to transformative effects and changes the game,” especially if transformation happen soon. If this paradigm is supposed to top out, I would like more precision about where and why.
Our reasoning about alternatives to ML seems really weak and uninformative, and it’s plausible that reasoning about ML is a better guide to what happens in the future even if there is a big transformation.
At this point it’s been more than 10 years with essentially no changes to the basic paradigm that would be relevant to alignment. Surely that’s enough to get to a reasonable probability of 10 more years?
This outcome already looked quite plausible to me in 2016, such that it was already worth focusing on, and it seems like evidence from the last 7 years makes it look much more likely.
This objection only holds if you imagine that AI systems are acquiring knowledge about the world / capability only through gradient decent, as opposed to training an agent that learns, and becomes more capable thereby, over runtime.
I would say: this objection holds for all AI designs that are being seriously considered to date. I agree it doesn’t apply in full generality to “not-modern-ML.” That said, you can use gradient descent to build agents that accumulate and manipulate knowledge (e.g. by reading and writing to databases, either in natural language or in opaque neuralese) and my argument applies just as well to those systems. I think you are imagining something more precise.
I do agree that once you say “the techniques will just be totally different from ML” then I get more into “all bets are off,” and maybe then I could end up with a 50-50 chance of AI systems concealing their capabilities (though that still seems high). That said, I think you shouldn’t be confident of “totally different from ML” at this point:
I think you have at least a reasonable probability on “modern ML leads to transformative effects and changes the game,” especially if transformation happen soon. If this paradigm is supposed to top out, I would like more precision about where and why.
Our reasoning about alternatives to ML seems really weak and uninformative, and it’s plausible that reasoning about ML is a better guide to what happens in the future even if there is a big transformation.
At this point it’s been more than 10 years with essentially no changes to the basic paradigm that would be relevant to alignment. Surely that’s enough to get to a reasonable probability of 10 more years?
This outcome already looked quite plausible to me in 2016, such that it was already worth focusing on, and it seems like evidence from the last 7 years makes it look much more likely.