I would say: this objection holds for all AI designs that are being seriously considered to date. I agree it doesn’t apply in full generality to “not-modern-ML.” That said, you can use gradient descent to build agents that accumulate and manipulate knowledge (e.g. by reading and writing to databases, either in natural language or in opaque neuralese) and my argument applies just as well to those systems. I think you are imagining something more precise.
I do agree that once you say “the techniques will just be totally different from ML” then I get more into “all bets are off,” and maybe then I could end up with a 50-50 chance of AI systems concealing their capabilities (though that still seems high). That said, I think you shouldn’t be confident of “totally different from ML” at this point:
I think you have at least a reasonable probability on “modern ML leads to transformative effects and changes the game,” especially if transformation happen soon. If this paradigm is supposed to top out, I would like more precision about where and why.
Our reasoning about alternatives to ML seems really weak and uninformative, and it’s plausible that reasoning about ML is a better guide to what happens in the future even if there is a big transformation.
At this point it’s been more than 10 years with essentially no changes to the basic paradigm that would be relevant to alignment. Surely that’s enough to get to a reasonable probability of 10 more years?
This outcome already looked quite plausible to me in 2016, such that it was already worth focusing on, and it seems like evidence from the last 7 years makes it look much more likely.
I would say: this objection holds for all AI designs that are being seriously considered to date. I agree it doesn’t apply in full generality to “not-modern-ML.” That said, you can use gradient descent to build agents that accumulate and manipulate knowledge (e.g. by reading and writing to databases, either in natural language or in opaque neuralese) and my argument applies just as well to those systems. I think you are imagining something more precise.
I do agree that once you say “the techniques will just be totally different from ML” then I get more into “all bets are off,” and maybe then I could end up with a 50-50 chance of AI systems concealing their capabilities (though that still seems high). That said, I think you shouldn’t be confident of “totally different from ML” at this point:
I think you have at least a reasonable probability on “modern ML leads to transformative effects and changes the game,” especially if transformation happen soon. If this paradigm is supposed to top out, I would like more precision about where and why.
Our reasoning about alternatives to ML seems really weak and uninformative, and it’s plausible that reasoning about ML is a better guide to what happens in the future even if there is a big transformation.
At this point it’s been more than 10 years with essentially no changes to the basic paradigm that would be relevant to alignment. Surely that’s enough to get to a reasonable probability of 10 more years?
This outcome already looked quite plausible to me in 2016, such that it was already worth focusing on, and it seems like evidence from the last 7 years makes it look much more likely.