Sure, π∗ works for what I’m saying, assuming that sum-over-time only includes the timesteps taken thus far. In that case, I’m saying that either:
the mesa optimizer doesn’t appear in π∗, in which case the problem is fixed by fully optimizing everything at every timestep (i.e. by using π∗), or
the mesa optimizer does appear in π∗, in which case the problem was really an outer alignment issue all along.
Sure, π∗ works for what I’m saying, assuming that sum-over-time only includes the timesteps taken thus far. In that case, I’m saying that either:
the mesa optimizer doesn’t appear in π∗, in which case the problem is fixed by fully optimizing everything at every timestep (i.e. by using π∗), or
the mesa optimizer does appear in π∗, in which case the problem was really an outer alignment issue all along.