I think it’s still an open question to what extent not having any mesa-optimization would hurt capabilities, but my sense is indeed that mesa-optimization is likely inevitable if you want to build safe AGI which is competitive with a baseline unaligned approach. Thus, I tend towards thinking that the right strategy is to understand that you’re definitely going to produce a mesa-optimizer, and just have a really strong story for why it will be aligned.
I think it’s still an open question to what extent not having any mesa-optimization would hurt capabilities, but my sense is indeed that mesa-optimization is likely inevitable if you want to build safe AGI which is competitive with a baseline unaligned approach. Thus, I tend towards thinking that the right strategy is to understand that you’re definitely going to produce a mesa-optimizer, and just have a really strong story for why it will be aligned.