To me, this seems like disregarding contexts where the concepts are incongruous, to no particular gain.
In one direction, you might want to talk about a corrigible AI doing sweeping, unique, full-of-side-effects things that involve deep reasoning about what humans want, not just filling in a regularly-shaped hole in a human-made plan.
In another direction, you might want to talk about failures of tool-like things to be corrigible, perhaps even tool-like AIs that are hard to correct.
Maybe one way to phrase it is that tool-ness is the cause of powerful corrigible systems, in that it is a feature that can be expressed in reality which has the potential to make there be powerful corrigible systems, and that there are no other known expressible features which become corrigible.
So as notkilleveryoneists, a worst-case scenario would be if we start advocating for suppressing tool-like AIs based on speculative failure modes instead of trying to solve those failure modes, and then start chasing a hypothetical corrigible non-tool that cannot exist.
To me, this seems like disregarding contexts where the concepts are incongruous, to no particular gain.
In one direction, you might want to talk about a corrigible AI doing sweeping, unique, full-of-side-effects things that involve deep reasoning about what humans want, not just filling in a regularly-shaped hole in a human-made plan.
In another direction, you might want to talk about failures of tool-like things to be corrigible, perhaps even tool-like AIs that are hard to correct.
Maybe one way to phrase it is that tool-ness is the cause of powerful corrigible systems, in that it is a feature that can be expressed in reality which has the potential to make there be powerful corrigible systems, and that there are no other known expressible features which become corrigible.
So as notkilleveryoneists, a worst-case scenario would be if we start advocating for suppressing tool-like AIs based on speculative failure modes instead of trying to solve those failure modes, and then start chasing a hypothetical corrigible non-tool that cannot exist.