tailcalled comments on Corrigibility = Tool-ness?

tailcalled 28 Jun 2024 9:49 UTC
4 points
0
Maybe one way to phrase it is that tool-ness is the cause of powerful corrigible systems, in that it is a feature that can be expressed in reality which has the potential to make there be powerful corrigible systems, and that there are no other known expressible features which become corrigible.
So as notkilleveryoneists, a worst-case scenario would be if we start advocating for suppressing tool-like AIs based on speculative failure modes instead of trying to solve those failure modes, and then start chasing a hypothetical corrigible non-tool that cannot exist.