I like Seth’s thoughts on this, and I do think that Seth’s proposal and Max’s proposal do end up pointing at a very similar path. I do think that Max has some valuable insights explained in his more detailed Corrigibility-as-a-target theory which aren’t covered here.
For me, I found it helpful seeing Seth’s take evolve separately from Max’s, as having them both independently come to similar ideas made me feel more confident about the ideas being valuable.
I like Seth’s thoughts on this, and I do think that Seth’s proposal and Max’s proposal do end up pointing at a very similar path. I do think that Max has some valuable insights explained in his more detailed Corrigibility-as-a-target theory which aren’t covered here.
For me, I found it helpful seeing Seth’s take evolve separately from Max’s, as having them both independently come to similar ideas made me feel more confident about the ideas being valuable.