I don’t think analogy with humans is reliable. But for the sake of argument I’d like to highlight that corporations and countries are mostly limited by their power, not by alignment. Usually countries declare independence once they are able to.
Most humans are not obedient/subservient to others (at least not maximally so). But also: Most humans would not exterminate the rest of humanity if given the power to do so. I think many humans, if they became a “singleton”, would want to avoid killing other humans. Some would also be inclined to make the world a good place to live for everyone (not just other humans, but other sentient beings as well).
From my perspective, the example of humans was intended as “existence proof”. I expect AGIs we develop to be quite different from ourselves. I wouldn’t be interested in the topic of alignment if I didn’t perceive there to be risks associated with misaligned AGI, but I also don’t think alignment is doomed/hopeless or anything like that 🙂
The only way to control AGI is to contain it. We need to ensure that we run AGI in fully isolated simulations and gather insights with the assumption that the AGI will try to seek power in simulated environment.
I feel that you don’t find my words convincing, maybe I’ll find a better way to articulate my proof. Until then I want to contribute as much as I can to safety.
Thanks for feedback.
I don’t think analogy with humans is reliable. But for the sake of argument I’d like to highlight that corporations and countries are mostly limited by their power, not by alignment. Usually countries declare independence once they are able to.
Most humans are not obedient/subservient to others (at least not maximally so). But also: Most humans would not exterminate the rest of humanity if given the power to do so. I think many humans, if they became a “singleton”, would want to avoid killing other humans. Some would also be inclined to make the world a good place to live for everyone (not just other humans, but other sentient beings as well).
From my perspective, the example of humans was intended as “existence proof”. I expect AGIs we develop to be quite different from ourselves. I wouldn’t be interested in the topic of alignment if I didn’t perceive there to be risks associated with misaligned AGI, but I also don’t think alignment is doomed/hopeless or anything like that 🙂
But it is doomed, the proof is above.
The only way to control AGI is to contain it. We need to ensure that we run AGI in fully isolated simulations and gather insights with the assumption that the AGI will try to seek power in simulated environment.
I feel that you don’t find my words convincing, maybe I’ll find a better way to articulate my proof. Until then I want to contribute as much as I can to safety.
Please don’t.
Please refute the proof rationally before directing.