TurnTrout comments on Trying to isolate objectives: approaches toward high-level interpretability

TurnTrout 12 Jan 2023 6:30 UTC
LW: 3 AF: 3
0
AF
Yeah, IMO “RL at scale trains search-based mesa optimizers” hypothesis predicts “solving randomly generated mazes via a roughly unitary mesa objective and heuristic search” with reasonable probability, and that seems like a toy domain to me.