Goal misgeneralisation could lead to a generalised preference for switches to be in the “OFF” position.
The AI could for example want to prevent future activations of modified successor systems. The intelligent self-turning-off “useless box” doesn’t just flip the switch, it destroys itself, and destroys anything that could re-create itself.
Until we solve goal misgeneralisation and alignment in general, I think any ASI will be unsafe.
Goal misgeneralisation could lead to a generalised preference for switches to be in the “OFF” position.
The AI could for example want to prevent future activations of modified successor systems. The intelligent self-turning-off “useless box” doesn’t just flip the switch, it destroys itself, and destroys anything that could re-create itself.
Until we solve goal misgeneralisation and alignment in general, I think any ASI will be unsafe.