simon comments on The self-unalignment problem

simon 14 Apr 2023 18:28 UTC
7 points
3
Strong disagree with that particular conception of “minimality” being desirable. A desirable conception of “minimal” AGI from my perspective would be one which can be meaningfully aligned with humanity while being minimally dangerous, full stop. Getting that is still useful because it at least gets you knowledge you could use to make a stronger one later.
If you add “preventing immediately following AGIs from destroying the world” to the desiderata and remove “meaningfully aligned”, your attempted clever scheme to cause a pivotal act then shut down will:
a) fail to shut down soon enough, and destroy the world
b) get everyone really angry, then we repeat the situation but with a worse mindset
c) incentivize the AGI’s creators to re-deploy it to prevent (b), which if they succeed and also avoid (a) ends up with them ruling the world and being forced into tyrannical rule due to lack of legitimacy
and in addition to the above:
If you plan to do that, everyone who doesn’t agree with that plan is incentivized to accelerate their own plans, and make them more focused on being capable to enact changes to the world, to beat you to the punch. If you want to avoid race dynamics you need to be focused on not destroying the world with your own project, not on others.
P.S. unlike avturchin, I don’t actually object to openly expecting an AI “taking over the world”, if you can make a strong enough case that your AI is aligned properly. My objection is primarily to illegitimate actions, and I think a strong and believed-to-be-aligned AI can be expected to de facto take over in ways that are reliably perceived as (and thus are) legitimate. Taking actions that the planners of those actions refuse to specify exactly due to them being admittedly “outside the Overton window” is an entirely different matter!
- baturinsky 15 Apr 2023 10:21 UTC
  1 point
  −2
  Parent
  Very soon (months?) after first real AGI is made, all AGIs will be aligned with each other, and all newly made AGIs will also be aligned with those already existing. One way or another.
  Question is, how much of humanity still exist by that time, and will those AGI also be aligned with humanity.
  But yes, I think it’s possible to get to that state in relatively non-violent and lawful way.