abramdemski comments on My AGI Threat Model: Misaligned Model-Based RL Agent

abramdemski 13 May 2021 15:12 UTC
LW: 2 AF: 2
AF
I have not properly read all of that yet, but my very quick take is that your argument for a need for online learning strikes me as similar to your argument against the classic inner alignment problem applying to the architectures you are interested in. You find what I call mesa-learning implausible for the same reasons you find mesa-optimization implausible.
Personally, I’ve come around to the position (seemingly held pretty strongly by other folks, eg Rohin) that mesa-learning is practically inevitable for most tasks.