Another argument for shorter CEV timelines, is that AI itself may help complete the theory of CEV alignment.
I agree with this part. Thatâs why Iâve been saying âmaybe we can do this in a few subjective decades or centuriesâ rather than âmaybe we can do this in a few subjective millennia.â đ
But Iâm mostly imagining AGI helping us get CEV theory faster. Which obviously requires a lot of prior alignment just to make use of the AGI safely, and to trust its outputs.
The idea is to keep ratcheting up alignment so we can safely make use of more capabilitiesâand then, in at least some cases, using those new capabilities to further improve and accelerate our next ratcheting-up of alignment.
Along with the traditional powers of computationâcalculation, optimization, deduction, etcâlanguage models, despite their highly uneven output, are giving us a glimpse of what it will be like, to have AI contributing even to discussions like this. That day isnât far off at all.
⌠And that makes you feel optimistic about the rush-to-CEV option? âUnaligned AGIs or proto-AGIs generating plausble-sounding arguments about how to do CEVâ is not a scenario that makes me update toward humanity surviving.
there will be great temptations to use tool AGI to carry out interventions that have nothing to do with stopping unsafe AGI...
I share your pessimism about any group that would feel inclined to give in to those temptations, when the entire future light cone is at stake.
The scenario where we narrowly avoid paperclips by the skin of our teeth, and now have a chance to breathe and think things through before taking any major action, is indeed a fragile one in some respects, where there are many ways to rapidly destroy all of the futureâs value by overextending. (E.g., using more AGI capabilities than you can currently align, or locking in contemporary human values that shouldnât be locked in, or hastily picking the wrong theory or implementation of âhow to do moral progressâ.)
I donât think we necessarily disagree about anything except âhow hard is CEVâ? It sounds to me like weâd mostly have the same intuitions conditional on âCEV is very hardâ; but I take this very much for granted, so Iâm freely focusing my attention on âOK, how could we make things go well given that fact?â.
I donât think we necessarily disagree about anything except âhow hard is CEVâ? It sounds to me like weâd mostly have the same intuitions conditional on âCEV is very hardâ
I disagree on the plausibility of a stop-the-world scheme. The way things are now is as safe or stable as they will ever be. I think itâs a better plan to use the rising tide of AI capabilities to flesh out CEV. In particular, since the details of CEV depend on not-yet-understood details of human cognition, one should think about how to use near-future AI power to extract those details from the available data regarding brains and behavior. But AI can contribute everywhere else in the development of CEV theory and practice too.
But I wonât try to change your thinking. In practice, MIRIâs work on simpler forms of alignment (and its work on logical induction, decision theory paradigms, etc) is surely relevant to a âCEV nowâ approach too. What I am wondering is, where is the de-facto center of gravity for a âCEV nowâ effort? Iâve said many times that June Kuâs blueprint is the best Iâve seen, but I donât see anyone working to refine it. And there are other people whose work seems relevant and has a promising rigor, but Iâm not sure how it best fits together.
edit: I do see a value to non-CEV alignment work, distinct from figuring out how to stop the world safely: and that is to reduce the risk arising from the general usage of advanced AI systems. So it is a contribution to AI safety.
I agree with this part. Thatâs why Iâve been saying âmaybe we can do this in a few subjective decades or centuriesâ rather than âmaybe we can do this in a few subjective millennia.â đ
But Iâm mostly imagining AGI helping us get CEV theory faster. Which obviously requires a lot of prior alignment just to make use of the AGI safely, and to trust its outputs.
The idea is to keep ratcheting up alignment so we can safely make use of more capabilitiesâand then, in at least some cases, using those new capabilities to further improve and accelerate our next ratcheting-up of alignment.
⌠And that makes you feel optimistic about the rush-to-CEV option? âUnaligned AGIs or proto-AGIs generating plausble-sounding arguments about how to do CEVâ is not a scenario that makes me update toward humanity surviving.
I share your pessimism about any group that would feel inclined to give in to those temptations, when the entire future light cone is at stake.
The scenario where we narrowly avoid paperclips by the skin of our teeth, and now have a chance to breathe and think things through before taking any major action, is indeed a fragile one in some respects, where there are many ways to rapidly destroy all of the futureâs value by overextending. (E.g., using more AGI capabilities than you can currently align, or locking in contemporary human values that shouldnât be locked in, or hastily picking the wrong theory or implementation of âhow to do moral progressâ.)
I donât think we necessarily disagree about anything except âhow hard is CEVâ? It sounds to me like weâd mostly have the same intuitions conditional on âCEV is very hardâ; but I take this very much for granted, so Iâm freely focusing my attention on âOK, how could we make things go well given that fact?â.
I disagree on the plausibility of a stop-the-world scheme. The way things are now is as safe or stable as they will ever be. I think itâs a better plan to use the rising tide of AI capabilities to flesh out CEV. In particular, since the details of CEV depend on not-yet-understood details of human cognition, one should think about how to use near-future AI power to extract those details from the available data regarding brains and behavior. But AI can contribute everywhere else in the development of CEV theory and practice too.
But I wonât try to change your thinking. In practice, MIRIâs work on simpler forms of alignment (and its work on logical induction, decision theory paradigms, etc) is surely relevant to a âCEV nowâ approach too. What I am wondering is, where is the de-facto center of gravity for a âCEV nowâ effort? Iâve said many times that June Kuâs blueprint is the best Iâve seen, but I donât see anyone working to refine it. And there are other people whose work seems relevant and has a promising rigor, but Iâm not sure how it best fits together.
edit: I do see a value to non-CEV alignment work, distinct from figuring out how to stop the world safely: and that is to reduce the risk arising from the general usage of advanced AI systems. So it is a contribution to AI safety.