1. Can you flesh out your view of how the community is making “slow but steady progress right now on getting ready”? In my view, much of the AI safety community seems to be doing things that have unclear safety value to me, like (a) coordinating a pause in model training that seems likely to me to make things less safe if implemented (because of leading to algorithmic and hardware overhangs) or (b) converting to capabilities work (quite common, seems like an occupational hazard for someone with initially “pure” AI safety values). Of course, I don’t mean to be disparaging, as plenty of AI safety work does seem useful qua safety to me, like making more precise estimates of takeoff speeds or doing cybersecurity work. Just was surprised by that statement and I’m curious about how you are tracking progress here.
2. It seems like you think there are some key algorithmic insights, that once “unlocked”, will lead to dramatically faster AI development. This suggests that not many people are working on algorithmic insights. But that doesn’t seem quite right to me—isn’t that a huge group of researchers, many of whom have historically been anti-scaling? Or maybe you think there are core insights available, but the field hasn’t had (enough of) its Einsteins or von Neumanns yet? Basically, I’m trying to get a sense of why you seem to have very fast takeoff speed estimates given certain algorithmic progress. But maybe I’m not understanding your worldview and/or maybe it’s too infohazardous to discuss.
Can you flesh out your view of how the community is making “slow but steady progress right now on getting ready”?
I finished writing this less than a year ago, and it seems to be meaningfully impacting a number of people’s thinking, hopefully for the better. I personally feel strongly like I’m making progress on a worthwhile project and would like lots more time to carry it through, and if it doesn’t work out I have others in the pipeline. I continue to have ideas at a regular clip that I think are both important and obvious-in-hindsight, and to notice new mistakes that I and others have been making. I don’t know that any of the above will be convincing to a skeptic, but it seems worth mentioning that my first-person perspective feels very strongly like progress is being made each year.
A couple concepts that emerged in the past couple years that I find very important and clarifying for my thinking are “Concept Extrapolation” by Stuart Armstrong (which he continues to work on) and “goal misgeneralization” by Krueger (notwithstanding controversy, see here). As another example, I don’t think Paul Christiano et al.’s ELK project is the most urgent thing but I still think it’s worthwhile, and AFAICT it’s moving along steadily.
I dunno. Read alignment forum. You really find absolutely everything totally useless? That seems like a pretty strong and unlikely statement from my perspective.
It’s critically important that, in the ML community as a whole, people are generally familiar with TAI risks, and at least have reasonable buy-in that they’re a real thing, as opposed to mocking disinterest. After all, whatever alignment ideas we have, the programmers have to actually implement them and test them, and their managers have to strongly support that too. I’m not too certain here, but my guess is that things are moving more forward than backwards in that domain. For example, my guess is that younger people entering the field of AI are less likely to mock the idea of TAI risk than are older people retiring out of the field of AI. I’d be curious if someone could confirm or disconfirm that.
It seems like you think there are some key algorithmic insights, that once “unlocked”, will lead to dramatically faster AI development. This suggests that not many people are working on algorithmic insights.
I dispute that. It’s not like algorithmic insights appear as soon as people look for them. For example, Judea Pearl published the belief propagation algorithm in 1982. Why hadn’t someone already published it in 1962? Or 1922? (That’s only partly rhetorical—if you or anyone has a good answer, I’m interested to hear it!)
For example, people have known for decades that flexible hierarchical planning is very important in humans but no one can get it to really work well in AI, especially in a reinforcement learning context. I assume that researchers are continuing to try as we speak. Speaking of which, I believe that conventional wisdom in ML is that RL is a janky mess. It’s notable that LLMs have largely succeeded by just giving up on RL (apart from an optional fine-tuning step at the end), as opposed to by getting RL to really work well. And yet RL does really work well in human brains. RL is much much more centrally involved in human cognition than in LLM training. So it’s not like RL is fundamentally impossible. It’s just that people can’t get it to work well. And it sure isn’t for lack of trying! So yeah, I think there remain algorithmic ideas yet to be discovered.
For example, Judea Pearl published the belief propagation algorithm in 1982. Why hadn’t someone already published it in 1962? Or 1922?
Belief propagation is the kind of thing that most people wouldn’t work on in an age before computers. It would be difficult to evaluate/test, but more importantly wouldn’t have much hope for application. Seems to me it arrived at a pretty normal time in our world.
For example, people have known for decades that flexible hierarchical planning is very important in humans but no one can get it to really work well in AI, especially in a reinforcement learning context.
Belief propagation is the kind of thing that most people wouldn’t work on in an age before computers. It would be difficult to evaluate/test, but more importantly wouldn’t have much hope for application.
Hmm. I’m not sure I buy that. Can’t we say the same thing about FFT? Doing belief prop by hand doesn’t seem much different from doing an FFT by hand; and both belief prop and FFT were totally doable on a 1960s mainframe, if not earlier, AFAICT. But the modern FFT algorithm was published in 1965, and people got the gist of it in 1942, and 1932, and even Gauss in 1805 had the basic idea (according to wikipedia). FFTs are obviously super useful, but OTOH people do seem to find belief prop useful today, for various things, as far as I can tell, and I don’t see why they wouldn’t have found it useful in the 1960s as well if they had known about it.
What do you think of diffusion planning?
I think it’s interesting, thanks for sharing! But I have no other opinion about it to share. :)
If (for the sake of argument) Diffusion Planning is the (or part of the) long-sought-out path to getting flexible hierarchical planning to work well in practical AI systems, then I don’t think that would undermine any of the main points that I’m trying to make here. Diffusion Planning was, after all, (1) just published last year, (2) still at the “proof of principle / toy models” stage, and (3) not part of the existing LLM pipeline / paradigm.
Thanks! I agree with you about all sorts of AI alignment essays being interesting and seemingly useful. My question was more about how to measure the net rate of AI safety research progress. But I agree with you that an/your expert inside view of how insights are accumulating is a reasonable metric. I also agree with you that the acceptance of TAI x-risk in the ML community as a real thing is useful and that—while I am slightly worried about the risk of overshooting, like Scott Alexander describes—this situation seems to be generally improving.
Regarding (2), my question is why algorithmic growth leading to serious growth of AI capabilities would be so discontinuous. I agree that RL is much better in humans than in machines, but I doubt that replicating this in machines would require just one or a few algorithmic advances. Instead, my guess, based on previous technology growth stories I’ve read about, is that AI algorithmic progress is likely to occur due to the accumulation of many small improvements over time.
Oh, I somehow missed that your original question was about takeoff speeds. When you wrote “algorithmic insights…will lead to dramatically faster AI development”, I misread it as “algorithmic insights…will lead to dramatically more powerful AIs”. Oops. Anyway, takeoff speeds are off-topic for this post, so I won’t comment on them, sorry. :)
I would not describe development of deep learning as discontinuous, but I would describe it as fast. As far as I can tell, development of deep learning happened by accumulation of many small improvements over time, sometimes humorously described as graduate student descent (better initialization, better activation function, better optimizer, better architecture, better regularization, etc.). It seems possible or even probable that brain-inspired RL could follow the similar trajectory once it took off, absent interventions like changes to open publishing norm.
Good essay! Two questions if you have a moment:
1. Can you flesh out your view of how the community is making “slow but steady progress right now on getting ready”? In my view, much of the AI safety community seems to be doing things that have unclear safety value to me, like (a) coordinating a pause in model training that seems likely to me to make things less safe if implemented (because of leading to algorithmic and hardware overhangs) or (b) converting to capabilities work (quite common, seems like an occupational hazard for someone with initially “pure” AI safety values). Of course, I don’t mean to be disparaging, as plenty of AI safety work does seem useful qua safety to me, like making more precise estimates of takeoff speeds or doing cybersecurity work. Just was surprised by that statement and I’m curious about how you are tracking progress here.
2. It seems like you think there are some key algorithmic insights, that once “unlocked”, will lead to dramatically faster AI development. This suggests that not many people are working on algorithmic insights. But that doesn’t seem quite right to me—isn’t that a huge group of researchers, many of whom have historically been anti-scaling? Or maybe you think there are core insights available, but the field hasn’t had (enough of) its Einsteins or von Neumanns yet? Basically, I’m trying to get a sense of why you seem to have very fast takeoff speed estimates given certain algorithmic progress. But maybe I’m not understanding your worldview and/or maybe it’s too infohazardous to discuss.
I finished writing this less than a year ago, and it seems to be meaningfully impacting a number of people’s thinking, hopefully for the better. I personally feel strongly like I’m making progress on a worthwhile project and would like lots more time to carry it through, and if it doesn’t work out I have others in the pipeline. I continue to have ideas at a regular clip that I think are both important and obvious-in-hindsight, and to notice new mistakes that I and others have been making. I don’t know that any of the above will be convincing to a skeptic, but it seems worth mentioning that my first-person perspective feels very strongly like progress is being made each year.
A couple concepts that emerged in the past couple years that I find very important and clarifying for my thinking are “Concept Extrapolation” by Stuart Armstrong (which he continues to work on) and “goal misgeneralization” by Krueger (notwithstanding controversy, see here). As another example, I don’t think Paul Christiano et al.’s ELK project is the most urgent thing but I still think it’s worthwhile, and AFAICT it’s moving along steadily.
I dunno. Read alignment forum. You really find absolutely everything totally useless? That seems like a pretty strong and unlikely statement from my perspective.
It’s critically important that, in the ML community as a whole, people are generally familiar with TAI risks, and at least have reasonable buy-in that they’re a real thing, as opposed to mocking disinterest. After all, whatever alignment ideas we have, the programmers have to actually implement them and test them, and their managers have to strongly support that too. I’m not too certain here, but my guess is that things are moving more forward than backwards in that domain. For example, my guess is that younger people entering the field of AI are less likely to mock the idea of TAI risk than are older people retiring out of the field of AI. I’d be curious if someone could confirm or disconfirm that.
I dispute that. It’s not like algorithmic insights appear as soon as people look for them. For example, Judea Pearl published the belief propagation algorithm in 1982. Why hadn’t someone already published it in 1962? Or 1922? (That’s only partly rhetorical—if you or anyone has a good answer, I’m interested to hear it!)
For example, people have known for decades that flexible hierarchical planning is very important in humans but no one can get it to really work well in AI, especially in a reinforcement learning context. I assume that researchers are continuing to try as we speak. Speaking of which, I believe that conventional wisdom in ML is that RL is a janky mess. It’s notable that LLMs have largely succeeded by just giving up on RL (apart from an optional fine-tuning step at the end), as opposed to by getting RL to really work well. And yet RL does really work well in human brains. RL is much much more centrally involved in human cognition than in LLM training. So it’s not like RL is fundamentally impossible. It’s just that people can’t get it to work well. And it sure isn’t for lack of trying! So yeah, I think there remain algorithmic ideas yet to be discovered.
Belief propagation is the kind of thing that most people wouldn’t work on in an age before computers. It would be difficult to evaluate/test, but more importantly wouldn’t have much hope for application. Seems to me it arrived at a pretty normal time in our world.
What do you think of diffusion planning?
Hmm. I’m not sure I buy that. Can’t we say the same thing about FFT? Doing belief prop by hand doesn’t seem much different from doing an FFT by hand; and both belief prop and FFT were totally doable on a 1960s mainframe, if not earlier, AFAICT. But the modern FFT algorithm was published in 1965, and people got the gist of it in 1942, and 1932, and even Gauss in 1805 had the basic idea (according to wikipedia). FFTs are obviously super useful, but OTOH people do seem to find belief prop useful today, for various things, as far as I can tell, and I don’t see why they wouldn’t have found it useful in the 1960s as well if they had known about it.
I think it’s interesting, thanks for sharing! But I have no other opinion about it to share. :)
If (for the sake of argument) Diffusion Planning is the (or part of the) long-sought-out path to getting flexible hierarchical planning to work well in practical AI systems, then I don’t think that would undermine any of the main points that I’m trying to make here. Diffusion Planning was, after all, (1) just published last year, (2) still at the “proof of principle / toy models” stage, and (3) not part of the existing LLM pipeline / paradigm.
Thanks! I agree with you about all sorts of AI alignment essays being interesting and seemingly useful. My question was more about how to measure the net rate of AI safety research progress. But I agree with you that an/your expert inside view of how insights are accumulating is a reasonable metric. I also agree with you that the acceptance of TAI x-risk in the ML community as a real thing is useful and that—while I am slightly worried about the risk of overshooting, like Scott Alexander describes—this situation seems to be generally improving.
Regarding (2), my question is why algorithmic growth leading to serious growth of AI capabilities would be so discontinuous. I agree that RL is much better in humans than in machines, but I doubt that replicating this in machines would require just one or a few algorithmic advances. Instead, my guess, based on previous technology growth stories I’ve read about, is that AI algorithmic progress is likely to occur due to the accumulation of many small improvements over time.
Oh, I somehow missed that your original question was about takeoff speeds. When you wrote “algorithmic insights…will lead to dramatically faster AI development”, I misread it as “algorithmic insights…will lead to dramatically more powerful AIs”. Oops. Anyway, takeoff speeds are off-topic for this post, so I won’t comment on them, sorry. :)
I would not describe development of deep learning as discontinuous, but I would describe it as fast. As far as I can tell, development of deep learning happened by accumulation of many small improvements over time, sometimes humorously described as graduate student descent (better initialization, better activation function, better optimizer, better architecture, better regularization, etc.). It seems possible or even probable that brain-inspired RL could follow the similar trajectory once it took off, absent interventions like changes to open publishing norm.