Wei Dai comments on Towards a mechanistic understanding of corrigibility

Wei Dai 1 Oct 2019 9:58 UTC
15 points

My current position is that using “act-based corrigibility” this way is quite confusing and until we come up with better terms we should probably just stick with “achieving corrigibility using act-based agents” and “the kind of corrigibility that act-based agents may be able to achieve” depending on which concept one wants to refer to.

Now that I’ve lived with understanding what Evan meant by “act-based corrigibility” for a while, I find that I’m having trouble holding on to my initial feeling of “this is likely to cause confusion to people”, despite consciously trying to, and it’s starting to feel more and more reasonable to use it the way Evan did. It seems like an interesting and revealing case of the illusion of transparency in action.