That would indeed be difficult, maybe by downloading human media and reading stories the AI could infer the jobs → supporting families link (this wouldn’t quite be strategy-stealing, since the AI doesn’t plan on having a family itself, just to understand why the human has one).
I mean it’s the sort of thing I was trying to get at with the strategy-stealing section...
I’m not saying that these long-term dynamics are especially easy to learn if you only have a short time window, what I’m saying is that there is non-zero incentive to learn long-term dynamics even in that case. I’m imagining what a perfect learning algorithm would do if it had to learn from 15 minute episodes (with correspondingly short-term values) starting from the same point in time, and access to video, text, and the internet, and that agent would certainly need to learn a lot about long-term dynamics. Though I completely agree that you’d likely need to achieve very high model competence before it would try to squeeze out the short-term gains of long-term dynamics understanding, and we’re not really close to that point right now. In my view extending the episode length is not required for building long-term dynamics models, merely very useful.
I think we might not really disagree, then. Not sure.
The thing about text is that it tends to contain exactly the sorts of high-level information that would usually be latent, so it is probably very useful to learn from.
And of course I agree that the model wouldn’t magically start planning for itself beyond the 15 minutes.
I should probably add that I think model timescale and planner timescale are mostly independent variables. You can set a planner to work on a very long timescale in a model that has been trained to be accurate for a very short timescale, or a planner to work on a very short timescale in a model that has been trained to be accurate for a very long timescale. Though I assume you’d get the most bang for your buck by having the timescales be roughly proportional.
I mean it’s the sort of thing I was trying to get at with the strategy-stealing section...
I think we might not really disagree, then. Not sure.
The thing about text is that it tends to contain exactly the sorts of high-level information that would usually be latent, so it is probably very useful to learn from.
I should probably add that I think model timescale and planner timescale are mostly independent variables. You can set a planner to work on a very long timescale in a model that has been trained to be accurate for a very short timescale, or a planner to work on a very short timescale in a model that has been trained to be accurate for a very long timescale. Though I assume you’d get the most bang for your buck by having the timescales be roughly proportional.