I assume that the learned components (world-model, value function, planner / actor) continue to be updated in deployment—a.k.a. online learning
If it’s not online learning in the strict sense, I’d expect a sufficient (and sufficiently-low-insight) process of updates and redeploys to have the same effect in the medium term. I agree that online learning is the most likely path in such scenarios, but I don’t think it’s necessary.
If it’s not online learning in the strict sense, I’d expect a sufficient (and sufficiently-low-insight) process of updates and redeploys to have the same effect in the medium term. I agree that online learning is the most likely path in such scenarios, but I don’t think it’s necessary.