I admit that I’m more excited by doing this because you’re asking it directly, and so I actually believe there will be some answer (which in my experience is rarely the case for my in-depth comments).
Thanks!
I’m not sure if I agree that there is no connection. The mesa-objective comes from the interaction of the outer objective, the training data/environments and the bias of the learning algorithm. So in some sense there is a connection. Although I agree that for the moment we lack a formal connection, which might have been your point.
Right. By “no connection” I specifically mean “we have no strong reason to posit any specific predictions we can make about mesa-objectives from outer objectives or other details of training”—at least not for training regimes of practical interest. (I will consider this detail for revision.)
I could have also written down my plausibility argument (that there is actually “no connection”), but probably that just distracts from the point here.
Thanks!
Right. By “no connection” I specifically mean “we have no strong reason to posit any specific predictions we can make about mesa-objectives from outer objectives or other details of training”—at least not for training regimes of practical interest. (I will consider this detail for revision.)
I could have also written down my plausibility argument (that there is actually “no connection”), but probably that just distracts from the point here.
(More later!)