For that part, the weaker assumption I usually use is that AI will end up making lots of big and fast (relative to our ability to meaningfully react) changes to the world, running lots of large real-world systems, etc, simply because it’s economically profitable to build AI which does those things. (That’s kinda the point of AI, after all.)
In a world where most stuff is run by AI (because it’s economically profitable to do so), and there’s RLHF-style direct incentives for those AIs to deceive humans… well, that’s the starting point to the Getting What You Measure scenario.
Insofar as power-seeking incentives enter the picture, it seems to me like the “minimal assumptions” entry point is not consequentialist reasoning within the AI, but rather economic selection pressures. If we’re using lots of AIs to do economically-profitable things, well, AIs which deceive us in power-seeking ways (whether “intentional” or not) will tend to make more profit, and therefore there will be selection pressure for those AIs in the same way that there’s selection pressure for profitable companies. Dial up the capabilities and widespread AI use, and that again looks like Getting What We Measure. (Related: the distinction here is basically the AI version of the distinction made in Unconscious Economics.)
This makes sense, thanks for explaining. So a threat model with specification gaming as its only technical cause, can cause x-risk under the right (i.e. wrong) societal conditions.
For that part, the weaker assumption I usually use is that AI will end up making lots of big and fast (relative to our ability to meaningfully react) changes to the world, running lots of large real-world systems, etc, simply because it’s economically profitable to build AI which does those things. (That’s kinda the point of AI, after all.)
In a world where most stuff is run by AI (because it’s economically profitable to do so), and there’s RLHF-style direct incentives for those AIs to deceive humans… well, that’s the starting point to the Getting What You Measure scenario.
Insofar as power-seeking incentives enter the picture, it seems to me like the “minimal assumptions” entry point is not consequentialist reasoning within the AI, but rather economic selection pressures. If we’re using lots of AIs to do economically-profitable things, well, AIs which deceive us in power-seeking ways (whether “intentional” or not) will tend to make more profit, and therefore there will be selection pressure for those AIs in the same way that there’s selection pressure for profitable companies. Dial up the capabilities and widespread AI use, and that again looks like Getting What We Measure. (Related: the distinction here is basically the AI version of the distinction made in Unconscious Economics.)
This makes sense, thanks for explaining. So a threat model with specification gaming as its only technical cause, can cause x-risk under the right (i.e. wrong) societal conditions.