I have not properly read all of that yet, but my very quick take is that your argument for a need for online learning strikes me as similar to your argument against the classic inner alignment problem applying to the architectures you are interested in. You find what I call mesa-learning implausible for the same reasons you find mesa-optimization implausible.
RE online learning, I acknowledge that a lot of reasonable people agree with you on that, and it’s hard to know for sure. But I argued my position in Against evolution as an analogy for how humans will build AGI.
Also there: a comment thread about why I’m skeptical that GPT-N would be capable of doing the things we want AGI to do, unless we fine-tune the weights on the fly, in a manner reminiscent of online learning (or amplification).
I have not properly read all of that yet, but my very quick take is that your argument for a need for online learning strikes me as similar to your argument against the classic inner alignment problem applying to the architectures you are interested in. You find what I call mesa-learning implausible for the same reasons you find mesa-optimization implausible.
Personally, I’ve come around to the position (seemingly held pretty strongly by other folks, eg Rohin) that mesa-learning is practically inevitable for most tasks.