Thank you so much for your kind words! I really appreciate it.
One definition of alignment is: Will the AI do what we want it to do? And as your post compellingly argues, “what we want it to do” is not well-defined, because it is something that a powerful AI could be able to influence. For many settings, using a term that’s less difficult to rigorously pin down, like safe AI,trustworthy AI,or corrigible AI, could have better utility.
I would definitely count the AI’s drive towards self-improvement as a part of the College Kid Problem! Sorry if the post did not make that clear.
Great post Peter. I think a lot about whether it even makes sense to use the term “aligned AGI” as powerfull AGIs may break human intention for a number of reasons (https://www.lesswrong.com/posts/3broJA5XpBwDbjsYb/agency-engineering-is-ai-alignment-to-human-intent-enough).
I see you didn’t refer to AIs become self driven (as in Omohundro: https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf). Is there a reason you don’t view this as part of the college kid problem?
Alignment needs something to.align with, but it’s far from proven that there is a coherent set of values shared by all humans.
Thank you so much for your kind words! I really appreciate it.
One definition of alignment is: Will the AI do what we want it to do? And as your post compellingly argues, “what we want it to do” is not well-defined, because it is something that a powerful AI could be able to influence. For many settings, using a term that’s less difficult to rigorously pin down, like safe AI, trustworthy AI, or corrigible AI, could have better utility.
I would definitely count the AI’s drive towards self-improvement as a part of the College Kid Problem! Sorry if the post did not make that clear.