Wei Dai comments on Towards a mechanistic understanding of corrigibility

Wei Dai 30 Sep 2019 16:23 UTC
LW: 14 AF: 7
AF
I thought Evan’s response was missing my point (that “act-based corrigibility” as used in OP doesn’t seem to be a kind of corrigibility as defined in the original corrigibility paper but just a way to achieve corrigibility) and had a chat with Evan about this on MIRIxDiscord (with Abram joining in). It turns out that by “act-based corrigibility” Evan meant both “a way of achieving something in the corrigibility cluster [by using act-based agents] as well as the particular thing in that cluster that you achieve if you actually get act-based corrigibility to work.”

The three of us talked a bit about finding better terms for these concepts but didn’t come up with any good candidates. My current position is that using “act-based corrigibility” this way is quite confusing and until we come up with better terms we should probably just stick with “achieving corrigibility using act-based agents” and “the kind of corrigibility that act-based agents may be able to achieve” depending on which concept one wants to refer to.
What links here?
- List of resolved confusions about IDA by Wei Dai (30 Sep 2019 20:03 UTC; 97 points)
- Paying the corrigibility tax by Max H (19 Apr 2023 1:57 UTC; 14 points)
- Wei Dai 1 Oct 2019 9:58 UTC
  15 points
  Parent
  
  My current position is that using “act-based corrigibility” this way is quite confusing and until we come up with better terms we should probably just stick with “achieving corrigibility using act-based agents” and “the kind of corrigibility that act-based agents may be able to achieve” depending on which concept one wants to refer to.
  
  Now that I’ve lived with understanding what Evan meant by “act-based corrigibility” for a while, I find that I’m having trouble holding on to my initial feeling of “this is likely to cause confusion to people”, despite consciously trying to, and it’s starting to feel more and more reasonable to use it the way Evan did. It seems like an interesting and revealing case of the illusion of transparency in action.