paulfchristiano comments on Paul’s research agenda FAQ

paulfchristiano 20 Sep 2022 17:20 UTC
LW: 4 AF: 4
AF
Yeah, “don’t care” is much too strong. This comment was just meant in the context of the current discussion. I could instead say:
The kind of alignment agenda that I’m working on, and the one we’re discussing here, is not relying on this kind of generalization of corrigibility. This kind of generalization isn’t why we are talking about corrigibility.
However, I agree that there are lots of approaches to building AI that rely on some kind of generalization of corrigibility, and that studying those is interesting and I do care about how that goes.
In the context of this discussion I also would have said that I don’t care about whether honesty generalizes. But that’s also something I do care about even though it’s not particularly relevant to this agenda (because the agenda is attempting to solve alignment under considerably more pessimistic assumptions).