tailcalled comments on Corrigibility = Tool-ness?

tailcalled 28 Jun 2024 7:20 UTC
5 points
0
This seems similar to the natural impact regularization/bounded agency things I’ve been bouncing around. (Though my frame to a greater extent expects it to happen “by default”?) I like your way of describing/framing it.

Let’s make it concrete: we cannot just ask a powerful corrigible AGI to “solve alignment” for us. There is no corrigible way to perform a task which the user is confused about; tools don’t do that.

Strongly agree with this.