Hm, I think an important piece of “intuitionistic proof” didn’t transfer, or is broken. Drawing attention to that part:
Regardless of the details of how “decisions” are made, it seems easy for the choice to be one of the massive array of outcomes possible once you have control of the light-cone, made possible by acquiring power.
So here, I realize, I am relying on something like “the AI implicitly moves toward an imagined realizable future”. I think that’s a lot easier to get than the pipeline you sketch.
I think I’m being pretty unclear—I’m having trouble conveying my thought structure here. I’ll go make a meta-level comment instead.
As I understand, your argument is that there are many dangerous world-states and few safe world-states, therefore most powerful agents would move to a dangerous state, in the spirit of entropy. This seems reasonable.
An alarming version of this argument assumes that the agents already have power, however I think that they don’t and that acquiring dangerous amounts of power is hard work and won’t happen by accident.
A milder version of the same argument says that even relatively powerless, unaligned agents would slowly and unknowingly inch towards a more dangerous future world-state. This is probably true, however, if humans retain some control, this is probably harmless. And it is also debatable to what extent that sort of probabilistic argument can work on a complex machine.
Hm, I think an important piece of “intuitionistic proof” didn’t transfer, or is broken. Drawing attention to that part:
So here, I realize, I am relying on something like “the AI implicitly moves toward an imagined realizable future”. I think that’s a lot easier to get than the pipeline you sketch.
I think I’m being pretty unclear—I’m having trouble conveying my thought structure here. I’ll go make a meta-level comment instead.
As I understand, your argument is that there are many dangerous world-states and few safe world-states, therefore most powerful agents would move to a dangerous state, in the spirit of entropy. This seems reasonable.
An alarming version of this argument assumes that the agents already have power, however I think that they don’t and that acquiring dangerous amounts of power is hard work and won’t happen by accident.
A milder version of the same argument says that even relatively powerless, unaligned agents would slowly and unknowingly inch towards a more dangerous future world-state. This is probably true, however, if humans retain some control, this is probably harmless. And it is also debatable to what extent that sort of probabilistic argument can work on a complex machine.