Yeah, we’ve also spent a while (maybe ~5 hours total?) in various CHAI meetings (some of which you’ve attended) trying to figure out the various definitions of corrigibility to no avail, but those notes are obviously not public. :(
That being said I don’t think failing in several hours of meetings/a few unpublished attempts is that much evidence of the difficulty?
Yeah, we’ve also spent a while (maybe ~5 hours total?) in various CHAI meetings (some of which you’ve attended) trying to figure out the various definitions of corrigibility to no avail, but those notes are obviously not public. :(
That being said I don’t think failing in several hours of meetings/a few unpublished attempts is that much evidence of the difficulty?
I just remembered (!) that I have more public writing disentangling various forms of corrigibility, and their benefits—Non-obstruction: A simple concept motivating corrigibility.