catubc comments on Can We Align a Self-Improving AGI?

catubc 7 Sep 2022 5:48 UTC
3 points
2
Great post Peter. I think a lot about whether it even makes sense to use the term “aligned AGI” as powerfull AGIs may break human intention for a number of reasons (https://www.lesswrong.com/posts/3broJA5XpBwDbjsYb/agency-engineering-is-ai-alignment-to-human-intent-enough).
I see you didn’t refer to AIs become self driven (as in Omohundro: https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf). Is there a reason you don’t view this as part of the college kid problem?
- TAG 7 Sep 2022 9:19 UTC
  3 points
  1
  Parent
  Alignment needs something to.align with, but it’s far from proven that there is a coherent set of values shared by all humans.
- Peter S. Park 8 Sep 2022 20:57 UTC
  2 points
  1
  Parent
  Thank you so much for your kind words! I really appreciate it.
  One definition of alignment is: Will the AI do what we want it to do? And as your post compellingly argues, “what we want it to do” is not well-defined, because it is something that a powerful AI could be able to influence. For many settings, using a term that’s less difficult to rigorously pin down, like safe AI, trustworthy AI, or corrigible AI, could have better utility.
  I would definitely count the AI’s drive towards self-improvement as a part of the College Kid Problem! Sorry if the post did not make that clear.