That would indeed be difficult, maybe by downloading human media and reading stories the AI could infer the jobs → supporting families link (this wouldn’t quite be strategy-stealing, since the AI doesn’t plan on having a family itself, just to understand why the human has one). I’m not saying that these long-term dynamics are especially easy to learn if you only have a short time window, what I’m saying is that there is non-zero incentive to learn long-term dynamics even in that case. I’m imagining what a perfect learning algorithm would do if it had to learn from 15 minute episodes (with correspondingly short-term values) starting from the same point in time, and access to video, text, and the internet, and that agent would certainly need to learn a lot about long-term dynamics. Though I completely agree that you’d likely need to achieve very high model competence before it would try to squeeze out the short-term gains of long-term dynamics understanding, and we’re not really close to that point right now. In my view extending the episode length is not required for building long-term dynamics models, merely very useful. And of course I agree that the model wouldn’t magically start planning for itself beyond the 15 minutes.
That would indeed be difficult, maybe by downloading human media and reading stories the AI could infer the jobs → supporting families link (this wouldn’t quite be strategy-stealing, since the AI doesn’t plan on having a family itself, just to understand why the human has one).
I mean it’s the sort of thing I was trying to get at with the strategy-stealing section...
I’m not saying that these long-term dynamics are especially easy to learn if you only have a short time window, what I’m saying is that there is non-zero incentive to learn long-term dynamics even in that case. I’m imagining what a perfect learning algorithm would do if it had to learn from 15 minute episodes (with correspondingly short-term values) starting from the same point in time, and access to video, text, and the internet, and that agent would certainly need to learn a lot about long-term dynamics. Though I completely agree that you’d likely need to achieve very high model competence before it would try to squeeze out the short-term gains of long-term dynamics understanding, and we’re not really close to that point right now. In my view extending the episode length is not required for building long-term dynamics models, merely very useful.
I think we might not really disagree, then. Not sure.
The thing about text is that it tends to contain exactly the sorts of high-level information that would usually be latent, so it is probably very useful to learn from.
And of course I agree that the model wouldn’t magically start planning for itself beyond the 15 minutes.
I should probably add that I think model timescale and planner timescale are mostly independent variables. You can set a planner to work on a very long timescale in a model that has been trained to be accurate for a very short timescale, or a planner to work on a very short timescale in a model that has been trained to be accurate for a very long timescale. Though I assume you’d get the most bang for your buck by having the timescales be roughly proportional.
That would indeed be difficult, maybe by downloading human media and reading stories the AI could infer the jobs → supporting families link (this wouldn’t quite be strategy-stealing, since the AI doesn’t plan on having a family itself, just to understand why the human has one). I’m not saying that these long-term dynamics are especially easy to learn if you only have a short time window, what I’m saying is that there is non-zero incentive to learn long-term dynamics even in that case. I’m imagining what a perfect learning algorithm would do if it had to learn from 15 minute episodes (with correspondingly short-term values) starting from the same point in time, and access to video, text, and the internet, and that agent would certainly need to learn a lot about long-term dynamics. Though I completely agree that you’d likely need to achieve very high model competence before it would try to squeeze out the short-term gains of long-term dynamics understanding, and we’re not really close to that point right now. In my view extending the episode length is not required for building long-term dynamics models, merely very useful. And of course I agree that the model wouldn’t magically start planning for itself beyond the 15 minutes.
I mean it’s the sort of thing I was trying to get at with the strategy-stealing section...
I think we might not really disagree, then. Not sure.
The thing about text is that it tends to contain exactly the sorts of high-level information that would usually be latent, so it is probably very useful to learn from.
I should probably add that I think model timescale and planner timescale are mostly independent variables. You can set a planner to work on a very long timescale in a model that has been trained to be accurate for a very short timescale, or a planner to work on a very short timescale in a model that has been trained to be accurate for a very long timescale. Though I assume you’d get the most bang for your buck by having the timescales be roughly proportional.