TurnTrout comments on Discussion with Eliezer Yudkowsky on AGI interventions

TurnTrout 11 Nov 2021 18:58 UTC
LW: 8 AF: 5
AF
Maybe there’s been a lot of non public work that I’m not privy to?
In Aug 2020 I gave formalizing corrigibility another shot, and got something interesting but wrong out the other end. Am planning to publish sometime, but beyond that I’m not aware of other attempts.
When I visited MIRI for a MIRI/CHAI social in 2018, I seriously suggested a break-out group in which we would figure out corrigibility (or the desirable property pointed at by corrigibility-adjacent intuitions) in two hours. I think more people should try this exact exercise more often—including myself.
- LawrenceC 11 Nov 2021 19:01 UTC
  3 points
  AF Parent
  Yeah, we’ve also spent a while (maybe ~5 hours total?) in various CHAI meetings (some of which you’ve attended) trying to figure out the various definitions of corrigibility to no avail, but those notes are obviously not public. :(
  
  That being said I don’t think failing in several hours of meetings/a few unpublished attempts is that much evidence of the difficulty?
  - TurnTrout 11 Nov 2021 23:16 UTC
    LW: 4 AF: 3
    AF Parent
    I just remembered (!) that I have more public writing disentangling various forms of corrigibility, and their benefits—Non-obstruction: A simple concept motivating corrigibility.