So if I’m getting at things correctly, capabilities and safety are highly correlated, and there can’t be situations where capabilities and alignment decouple.
Not that far, more like it doesn’t decouple until more progress has been made. Pure alignment is an advanced subtopic of AI research that requires more progress to have been made before it’s a viable field.
I’m not super confident in the above and wouldn’t discourage people from doing alignment work now (plus the obvious nuance that it’s not one big lump, there are some things that can be done later and some that can be done earlier) but the idea of alignment work that requires a whole bunch of work in serial, independent of AI capability work, doesn’t seem plausible to me. From Nate Soares’ post:
The most blatant case of alignment work that seems serial to me is work that requires having a theoretical understanding of minds/optimization/whatever, or work that requires having just the right concepts for thinking about minds.
This is the kind of thing that seems inextricably bound up with capability work to me. My impression is that MIRI tends to think that whatever route we take to get to AGI, as it moves from subhuman to human-level intelligence it will transform to be like the minds that they theorize about (and they think this will happen before it goes foom) no matter how different it was when it started. So even if they don’t know what a state of the art RL agent will look like five years from now, they feel confident they can theorize about what it will look like ten years from now. Whereas my view is that if you can’t get the former right you won’t get the latter right either.
To the extent that intelligences will converge towards a certain optimal way of thinking as they get smarter, being able to predict what that looks like will involve a lot of capability work (“Hmm, maybe it will learn like this; let’s code up an agent that learns that way and see how it does”). If you’re not grounding your work in concrete experiments you will end up with mistakes in your view of what an optimal agent looks like and no way to fix them.
A big part of my view is that we seem to still be a long way from AGI. This hinges on how “real” the intelligence behind LLMs is. If we have to take the RL route then we are a long way away—I wrote a piece on this, “What Happened to AIs Learning Games from Pixels?”, which points out how slow the progress has been and covers the areas where the field is stuck. On the other hand if we can get most of the way to AGI just with massive self-supervised training then it starts seeming more likely that we’ll walk into AGI without having a good understanding of what’s going on. I think that the failure of VPT for minecraft compared to GPT for language, and the difficulty LLMs have with extrapolation and innovation, means that self-supervised learning won’t be enough without more insight. I’ll be paying close attention to how GPT-4 and other LLMs do over the next few years to see if they’re making progress faster than I thought, but I talked to chatGPT and it was way worse than I thought it’d be.
I like your comments, 307th, and your linked post on RL SotA. I don’t agree with everything you say, but I some of what you say is quite on point. In particular I agree that ‘RL is currently being rather unimpressive in achieving complicated goals in complex wide-possible-action-space simulation worlds’. I agree that some fundamental breakthroughs are needed to change this, not just scaling existing methods. I disagree that such breakthroughs will necessarily require many calendar years of research. I think probably the eyes of the big research labs will soon be turning to focus more fully upon tackling complex-world RL, and that it won’t be long at all before significant breakthroughs start being made.
I think rather than thinking about research progress in terms of years, or even ‘researcher hours’, it’s more helpful to think of progress in terms of ‘research points’ devoted to the specific topic. An hour of a highly effective researcher at a well-funded lab, with a well-setup research environment that makes new experiments easy to run is worth vastly more ‘research points’ towards a topic than an hour of a compute-limited grad student without polished experiment-running code patterns, without access to huge compute resources, and without much experience running large experiments over many variables.
Not that far, more like it doesn’t decouple until more progress has been made. Pure alignment is an advanced subtopic of AI research that requires more progress to have been made before it’s a viable field.
I’m not super confident in the above and wouldn’t discourage people from doing alignment work now (plus the obvious nuance that it’s not one big lump, there are some things that can be done later and some that can be done earlier) but the idea of alignment work that requires a whole bunch of work in serial, independent of AI capability work, doesn’t seem plausible to me. From Nate Soares’ post:
This is the kind of thing that seems inextricably bound up with capability work to me. My impression is that MIRI tends to think that whatever route we take to get to AGI, as it moves from subhuman to human-level intelligence it will transform to be like the minds that they theorize about (and they think this will happen before it goes foom) no matter how different it was when it started. So even if they don’t know what a state of the art RL agent will look like five years from now, they feel confident they can theorize about what it will look like ten years from now. Whereas my view is that if you can’t get the former right you won’t get the latter right either.
To the extent that intelligences will converge towards a certain optimal way of thinking as they get smarter, being able to predict what that looks like will involve a lot of capability work (“Hmm, maybe it will learn like this; let’s code up an agent that learns that way and see how it does”). If you’re not grounding your work in concrete experiments you will end up with mistakes in your view of what an optimal agent looks like and no way to fix them.
A big part of my view is that we seem to still be a long way from AGI. This hinges on how “real” the intelligence behind LLMs is. If we have to take the RL route then we are a long way away—I wrote a piece on this, “What Happened to AIs Learning Games from Pixels?”, which points out how slow the progress has been and covers the areas where the field is stuck. On the other hand if we can get most of the way to AGI just with massive self-supervised training then it starts seeming more likely that we’ll walk into AGI without having a good understanding of what’s going on. I think that the failure of VPT for minecraft compared to GPT for language, and the difficulty LLMs have with extrapolation and innovation, means that self-supervised learning won’t be enough without more insight. I’ll be paying close attention to how GPT-4 and other LLMs do over the next few years to see if they’re making progress faster than I thought, but I talked to chatGPT and it was way worse than I thought it’d be.
I like your comments, 307th, and your linked post on RL SotA. I don’t agree with everything you say, but I some of what you say is quite on point. In particular I agree that ‘RL is currently being rather unimpressive in achieving complicated goals in complex wide-possible-action-space simulation worlds’. I agree that some fundamental breakthroughs are needed to change this, not just scaling existing methods. I disagree that such breakthroughs will necessarily require many calendar years of research. I think probably the eyes of the big research labs will soon be turning to focus more fully upon tackling complex-world RL, and that it won’t be long at all before significant breakthroughs start being made.
I think rather than thinking about research progress in terms of years, or even ‘researcher hours’, it’s more helpful to think of progress in terms of ‘research points’ devoted to the specific topic. An hour of a highly effective researcher at a well-funded lab, with a well-setup research environment that makes new experiments easy to run is worth vastly more ‘research points’ towards a topic than an hour of a compute-limited grad student without polished experiment-running code patterns, without access to huge compute resources, and without much experience running large experiments over many variables.