Very short. Longer timelines are logically possible, but I wouldn’t count on them.
As for this notion that something like CEV might require decades of thought to be figured out, or even might require decades of trial and error with AGI—that’s just a guess. I may be monotonous by saying June Ku over and over again (there are others whose work I intend to study too), but metaethical.ai is an extremely promising schema. If a serious effort was made to fill out that schema, while also critically but constructively examining its assumptions from all directions, who knows how far we’d get, and how quickly?
Another argument for shorter CEV timelines, is that AI itself may help complete the theory of CEV alignment. Along with the traditional powers of computation—calculation, optimization, deduction, etc—language models, despite their highly uneven output, are giving us a glimpse of what it will be like, to have AI contributing even to discussions like this. That day isn’t far off at all.
So from my perspective, long CEV timelines don’t actually seem likely. The other thing that I have great doubts about, is the stability of any world order in which a handful of humans - even if it were the NSA or the UN Security Council—use tool AGI to prevent everyone else from developing unsafe AGI. Targeting just one thing like GPUs won’t work forever because you can do computation in other ways; there will be great temptations to use tool AGI to carry out interventions that have nothing to do with stopping unsafe AGI… Anyone in such a position becomes a kind of world government.
The problem of “world government” or “what the fate of the world should be”, is something that CEV is meant to solve comprehensively, by providing an accurate first-principles extrapolation of humanity’s true volition, etc. But here, the scenario is an AGI-powered world takeover where the problems of governance and normativity have not been figured out. I’m not at all opposed to thinking about such scenarios; the next chapter of human affairs may indeed be one in which autonomous superhuman AI does not yet exist, but there are human elites possessing tool AGI. I just think that’s a highly unstable situation; making it stable and safe for humans would be hard to do without having something like CEV figured out; and because of the input of AI itself, one shouldn’t expect figuring out CEV to take a long time. I propose that it will be done relatively quickly, or not at all.
Another argument for shorter CEV timelines, is that AI itself may help complete the theory of CEV alignment.
I agree with this part. That’s why I’ve been saying ‘maybe we can do this in a few subjective decades or centuries’ rather than ‘maybe we can do this in a few subjective millennia.’ 🙂
But I’m mostly imagining AGI helping us get CEV theory faster. Which obviously requires a lot of prior alignment just to make use of the AGI safely, and to trust its outputs.
The idea is to keep ratcheting up alignment so we can safely make use of more capabilities—and then, in at least some cases, using those new capabilities to further improve and accelerate our next ratcheting-up of alignment.
Along with the traditional powers of computation—calculation, optimization, deduction, etc—language models, despite their highly uneven output, are giving us a glimpse of what it will be like, to have AI contributing even to discussions like this. That day isn’t far off at all.
… And that makes you feel optimistic about the rush-to-CEV option? ‘Unaligned AGIs or proto-AGIs generating plausble-sounding arguments about how to do CEV’ is not a scenario that makes me update toward humanity surviving.
there will be great temptations to use tool AGI to carry out interventions that have nothing to do with stopping unsafe AGI...
I share your pessimism about any group that would feel inclined to give in to those temptations, when the entire future light cone is at stake.
The scenario where we narrowly avoid paperclips by the skin of our teeth, and now have a chance to breathe and think things through before taking any major action, is indeed a fragile one in some respects, where there are many ways to rapidly destroy all of the future’s value by overextending. (E.g., using more AGI capabilities than you can currently align, or locking in contemporary human values that shouldn’t be locked in, or hastily picking the wrong theory or implementation of ‘how to do moral progress’.)
I don’t think we necessarily disagree about anything except ‘how hard is CEV’? It sounds to me like we’d mostly have the same intuitions conditional on ‘CEV is very hard’; but I take this very much for granted, so I’m freely focusing my attention on ‘OK, how could we make things go well given that fact?’.
I don’t think we necessarily disagree about anything except ‘how hard is CEV’? It sounds to me like we’d mostly have the same intuitions conditional on ‘CEV is very hard’
I disagree on the plausibility of a stop-the-world scheme. The way things are now is as safe or stable as they will ever be. I think it’s a better plan to use the rising tide of AI capabilities to flesh out CEV. In particular, since the details of CEV depend on not-yet-understood details of human cognition, one should think about how to use near-future AI power to extract those details from the available data regarding brains and behavior. But AI can contribute everywhere else in the development of CEV theory and practice too.
But I won’t try to change your thinking. In practice, MIRI’s work on simpler forms of alignment (and its work on logical induction, decision theory paradigms, etc) is surely relevant to a “CEV now” approach too. What I am wondering is, where is the de-facto center of gravity for a “CEV now” effort? I’ve said many times that June Ku’s blueprint is the best I’ve seen, but I don’t see anyone working to refine it. And there are other people whose work seems relevant and has a promising rigor, but I’m not sure how it best fits together.
edit: I do see a value to non-CEV alignment work, distinct from figuring out how to stop the world safely: and that is to reduce the risk arising from the general usage of advanced AI systems. So it is a contribution to AI safety.
Very short. Longer timelines are logically possible, but I wouldn’t count on them.
As for this notion that something like CEV might require decades of thought to be figured out, or even might require decades of trial and error with AGI—that’s just a guess. I may be monotonous by saying June Ku over and over again (there are others whose work I intend to study too), but metaethical.ai is an extremely promising schema. If a serious effort was made to fill out that schema, while also critically but constructively examining its assumptions from all directions, who knows how far we’d get, and how quickly?
Another argument for shorter CEV timelines, is that AI itself may help complete the theory of CEV alignment. Along with the traditional powers of computation—calculation, optimization, deduction, etc—language models, despite their highly uneven output, are giving us a glimpse of what it will be like, to have AI contributing even to discussions like this. That day isn’t far off at all.
So from my perspective, long CEV timelines don’t actually seem likely. The other thing that I have great doubts about, is the stability of any world order in which a handful of humans - even if it were the NSA or the UN Security Council—use tool AGI to prevent everyone else from developing unsafe AGI. Targeting just one thing like GPUs won’t work forever because you can do computation in other ways; there will be great temptations to use tool AGI to carry out interventions that have nothing to do with stopping unsafe AGI… Anyone in such a position becomes a kind of world government.
The problem of “world government” or “what the fate of the world should be”, is something that CEV is meant to solve comprehensively, by providing an accurate first-principles extrapolation of humanity’s true volition, etc. But here, the scenario is an AGI-powered world takeover where the problems of governance and normativity have not been figured out. I’m not at all opposed to thinking about such scenarios; the next chapter of human affairs may indeed be one in which autonomous superhuman AI does not yet exist, but there are human elites possessing tool AGI. I just think that’s a highly unstable situation; making it stable and safe for humans would be hard to do without having something like CEV figured out; and because of the input of AI itself, one shouldn’t expect figuring out CEV to take a long time. I propose that it will be done relatively quickly, or not at all.
I agree with this part. That’s why I’ve been saying ‘maybe we can do this in a few subjective decades or centuries’ rather than ‘maybe we can do this in a few subjective millennia.’ 🙂
But I’m mostly imagining AGI helping us get CEV theory faster. Which obviously requires a lot of prior alignment just to make use of the AGI safely, and to trust its outputs.
The idea is to keep ratcheting up alignment so we can safely make use of more capabilities—and then, in at least some cases, using those new capabilities to further improve and accelerate our next ratcheting-up of alignment.
… And that makes you feel optimistic about the rush-to-CEV option? ‘Unaligned AGIs or proto-AGIs generating plausble-sounding arguments about how to do CEV’ is not a scenario that makes me update toward humanity surviving.
I share your pessimism about any group that would feel inclined to give in to those temptations, when the entire future light cone is at stake.
The scenario where we narrowly avoid paperclips by the skin of our teeth, and now have a chance to breathe and think things through before taking any major action, is indeed a fragile one in some respects, where there are many ways to rapidly destroy all of the future’s value by overextending. (E.g., using more AGI capabilities than you can currently align, or locking in contemporary human values that shouldn’t be locked in, or hastily picking the wrong theory or implementation of ‘how to do moral progress’.)
I don’t think we necessarily disagree about anything except ‘how hard is CEV’? It sounds to me like we’d mostly have the same intuitions conditional on ‘CEV is very hard’; but I take this very much for granted, so I’m freely focusing my attention on ‘OK, how could we make things go well given that fact?’.
I disagree on the plausibility of a stop-the-world scheme. The way things are now is as safe or stable as they will ever be. I think it’s a better plan to use the rising tide of AI capabilities to flesh out CEV. In particular, since the details of CEV depend on not-yet-understood details of human cognition, one should think about how to use near-future AI power to extract those details from the available data regarding brains and behavior. But AI can contribute everywhere else in the development of CEV theory and practice too.
But I won’t try to change your thinking. In practice, MIRI’s work on simpler forms of alignment (and its work on logical induction, decision theory paradigms, etc) is surely relevant to a “CEV now” approach too. What I am wondering is, where is the de-facto center of gravity for a “CEV now” effort? I’ve said many times that June Ku’s blueprint is the best I’ve seen, but I don’t see anyone working to refine it. And there are other people whose work seems relevant and has a promising rigor, but I’m not sure how it best fits together.
edit: I do see a value to non-CEV alignment work, distinct from figuring out how to stop the world safely: and that is to reduce the risk arising from the general usage of advanced AI systems. So it is a contribution to AI safety.