Simplicia: The thing is, I basically do buy realism about rationality, and realism having implications for future powerful AI—in the limit. The completeness axiom still looks reasonable to me; in the long run, I expect superintelligent agents to get what they want, and anything that they don’t want to get destroyed as a side-effect. To the extent that I’ve been arguing that empirical developments in AI should make us rethink alignment, it’s not so much that I’m doubting the classical long-run story, but rather pointing out that the long run is “far away”—in subjective time, if not necessarily sidereal time. If you can get AI that does a lot of useful cognitive work before you get the superintelligence whose utility function has to be exactly right, that has implications for what we should be doing and what kind of superintelligence we’re likely to end up with.
I find it ironic that Simplicia’s position in this comment is not too far from my own, and yet my reaction to it was “AIIIIIIIIIIEEEEEEEEEE!”. The shrieking is about everyone who thinks about alignment having illegible models from the perspective of almost everyone else, of which this thread is an example.
Simplicia: The thing is, I basically do buy realism about rationality, and realism having implications for future powerful AI—in the limit. The completeness axiom still looks reasonable to me; in the long run, I expect superintelligent agents to get what they want, and anything that they don’t want to get destroyed as a side-effect. To the extent that I’ve been arguing that empirical developments in AI should make us rethink alignment, it’s not so much that I’m doubting the classical long-run story, but rather pointing out that the long run is “far away”—in subjective time, if not necessarily sidereal time. If you can get AI that does a lot of useful cognitive work before you get the superintelligence whose utility function has to be exactly right, that has implications for what we should be doing and what kind of superintelligence we’re likely to end up with.
I find it ironic that Simplicia’s position in this comment is not too far from my own, and yet my reaction to it was “AIIIIIIIIIIEEEEEEEEEE!”. The shrieking is about everyone who thinks about alignment having illegible models from the perspective of almost everyone else, of which this thread is an example.