Peter S. Park comments on Can We Align a Self-Improving AGI?

Peter S. Park 8 Sep 2022 20:57 UTC
2 points
1
Thank you so much for your kind words! I really appreciate it.
One definition of alignment is: Will the AI do what we want it to do? And as your post compellingly argues, “what we want it to do” is not well-defined, because it is something that a powerful AI could be able to influence. For many settings, using a term that’s less difficult to rigorously pin down, like safe AI, trustworthy AI, or corrigible AI, could have better utility.
I would definitely count the AI’s drive towards self-improvement as a part of the College Kid Problem! Sorry if the post did not make that clear.