I’d be interested in your thoughts on how we can do this: > However, perhaps more [emphasis] should be placed on building just-as-powerful AI systems that are restricted to short time horizons.
I can share some of my thoughts first, and would be keen to hear (both/either of) yours.
It’s worth saying up front that I also think this is a very productive direction, and that your post lays out a good case for why.
Here’s one salient baseline strategy, and a corresponding failure mode: At each point in time t, where is some time horizon h where AIs can pursue goals over horizon h. At each point in time, (social-impact-minded) AI companies aim to build “short-term goal” systems, which operate at horizon h, but not longer.
Note that this seems very natural, and also exactly matches what one might expect a purely profit- or prestige-driven company to do.
However, if h turns out to steadily increase over time (as in the straight-line extrapolation referenced below), then this leaves us in a difficult position.
Furthermore, this seems to leave us at the mercy of unknown empirical questions about deep learning. It doesn’t seem we have improved our chances relative to the baseline of “do the easiest thing at each point in time”.
So from a “differential progress” perspective, a more helpful research strategy seems to be: fix a task with some horizon h_1, then try to solve this task using only systems optimized over horizon h_2 < h_1.
This framing also possibly highlights two additional difficulties: (1) technically, end-to-end optimization has been quite effective for many DL tasks and (2) sociologically, the DL community has a tremendous aesthetic preference towards end-to-end approaches (which also translates into approaches and infra which favor end-to-end approaches), which makes it harder to gain widespread adoption for other approaches. Also agreed there are many offsetting factors like interpretability, control, etc. which you mention.
I’d be very interested in how you think about this research direction, and particularly interested if you think there are other/complementary research directions which improve our chances of ending up with short time horizon AIs.
2. I understand this matches up with your core point, but to check understanding and confirm agreement—It seems far from certain that so-called “short-term goal” AIs will dominate, and given these uncertainties, it
I would guess you are both somewhat more optimistic about “short-term goal AIs” than I am (in most discussions, I normally find myself arguing for competitiveness of short-term goal AIs, so it’s a nice change-of-perspective for me!). But I imagine(?) we might have similar views that:
So long as p(“short-term AIs dominate”) is not very close to 1, this leaves substantial risk.
So long as p(“short-term AIs dominate”) is not very close to 1, then there is reasonable room for dedicated efforts to push this probability up (or down).
There are many reasons this seems uncertain, but to spell out one: As you correctly point out, the applications where DL is useful right now (to the extent there are any) are predominantly short-term ones. But it also seems that the natural trend over time would be for AIs to move to competency on increasingly broad tasks in the future, and so the current situation doesn’t provide much evidence whether to expect this straight-line extrapolation vs. a Goldilocks effect you describe.
3. As a final point, regarding this:
Loss-of-Control Hypothesis 2: In several key domains, only AIs with long-term goals will be powerful. ... Why is Hypothesis 2 necessary for the “loss of control” scenario? The reason is that this scenario requires the “misaligned long-term powerful AI” to be not merely more powerful than humanity as it exists today, but more powerful than humanity in the future. Future humans will have at their disposal the assistance of short-term AIs.
A salient possibility for me is that long-term-optimized AIs are only mildly more powerful (say, equivalent to a 50% compute increase), but that this is enough for almost everyone to use long-term AIs. If more of the world (ML community, leaders, regulators, general public) agrees that long-horizon-optimized AIs are more dangerous than short-horizon-optimized AIs, then short-horizon-optimized AIs can become the norm in spite of this. But it seems unclear to what extent this will happen, and so this similarly seems well worth pushing on, for a social impact-minded person/team.
I would love to discuss more in the future, but in the interest of time (and because this is already quite long), I’m starting with what I expect are the most fruitful lines.
Thank you! I think that what we see right now is that as the horizon grows, the more “tricks” we need to make end-to-end learning works, to the extent that it might not really be end to end. So while supervised learning is very successful, and seems to be quite robust to choice of architecture, loss functions, etc., in RL we need to be much more careful, and often things won’t work “out of the box” in a purely end to end fashion.
I think the question would be how performance scales with horizon, if the returns are rapidly diminishing, and the cost to train is rapidly increasing (as might well be the case because of diminishing gradient signals, and much smaller availability of data), then it could be that the “sweet spot” of what is economical to train would remain at a reasonably short horizon (far shorter than the planning needed to take over the world) for a long time.
Hi, thanks both for writing this—I enjoyed it.
I’d be interested in your thoughts on how we can do this:
> However, perhaps more [emphasis] should be placed on building just-as-powerful AI systems that are restricted to short time horizons.
I can share some of my thoughts first, and would be keen to hear (both/either of) yours.
It’s worth saying up front that I also think this is a very productive direction, and that your post lays out a good case for why.
Here’s one salient baseline strategy, and a corresponding failure mode: At each point in time t, where is some time horizon h where AIs can pursue goals over horizon h. At each point in time, (social-impact-minded) AI companies aim to build “short-term goal” systems, which operate at horizon h, but not longer.
Note that this seems very natural, and also exactly matches what one might expect a purely profit- or prestige-driven company to do.
However, if h turns out to steadily increase over time (as in the straight-line extrapolation referenced below), then this leaves us in a difficult position.
Furthermore, this seems to leave us at the mercy of unknown empirical questions about deep learning. It doesn’t seem we have improved our chances relative to the baseline of “do the easiest thing at each point in time”.
So from a “differential progress” perspective, a more helpful research strategy seems to be: fix a task with some horizon h_1, then try to solve this task using only systems optimized over horizon h_2 < h_1.
This framing also possibly highlights two additional difficulties: (1) technically, end-to-end optimization has been quite effective for many DL tasks and (2) sociologically, the DL community has a tremendous aesthetic preference towards end-to-end approaches (which also translates into approaches and infra which favor end-to-end approaches), which makes it harder to gain widespread adoption for other approaches. Also agreed there are many offsetting factors like interpretability, control, etc. which you mention.
This suggests empirical angles similar to the one described [here](https://ought.org/updates/2022-04-06-process) by Ought.
I’d be very interested in how you think about this research direction, and particularly interested if you think there are other/complementary research directions which improve our chances of ending up with short time horizon AIs.
2. I understand this matches up with your core point, but to check understanding and confirm agreement—It seems far from certain that so-called “short-term goal” AIs will dominate, and given these uncertainties, it
I would guess you are both somewhat more optimistic about “short-term goal AIs” than I am (in most discussions, I normally find myself arguing for competitiveness of short-term goal AIs, so it’s a nice change-of-perspective for me!). But I imagine(?) we might have similar views that:
So long as p(“short-term AIs dominate”) is not very close to 1, this leaves substantial risk.
So long as p(“short-term AIs dominate”) is not very close to 1, then there is reasonable room for dedicated efforts to push this probability up (or down).
There are many reasons this seems uncertain, but to spell out one: As you correctly point out, the applications where DL is useful right now (to the extent there are any) are predominantly short-term ones. But it also seems that the natural trend over time would be for AIs to move to competency on increasingly broad tasks in the future, and so the current situation doesn’t provide much evidence whether to expect this straight-line extrapolation vs. a Goldilocks effect you describe.
3. As a final point, regarding this:
A salient possibility for me is that long-term-optimized AIs are only mildly more powerful (say, equivalent to a 50% compute increase), but that this is enough for almost everyone to use long-term AIs. If more of the world (ML community, leaders, regulators, general public) agrees that long-horizon-optimized AIs are more dangerous than short-horizon-optimized AIs, then short-horizon-optimized AIs can become the norm in spite of this. But it seems unclear to what extent this will happen, and so this similarly seems well worth pushing on, for a social impact-minded person/team.
I would love to discuss more in the future, but in the interest of time (and because this is already quite long), I’m starting with what I expect are the most fruitful lines.
Thank you! I think that what we see right now is that as the horizon grows, the more “tricks” we need to make end-to-end learning works, to the extent that it might not really be end to end. So while supervised learning is very successful, and seems to be quite robust to choice of architecture, loss functions, etc., in RL we need to be much more careful, and often things won’t work “out of the box” in a purely end to end fashion.
I think the question would be how performance scales with horizon, if the returns are rapidly diminishing, and the cost to train is rapidly increasing (as might well be the case because of diminishing gradient signals, and much smaller availability of data), then it could be that the “sweet spot” of what is economical to train would remain at a reasonably short horizon (far shorter than the planning needed to take over the world) for a long time.