A lot of AI-risk-concerned people are struggling with how to relate to dual-use research, and relatedly, to doing alignment research inside of AI orgs. There’s a pretty simple concept that seems, to me, to be key to thinking about this coherently: the dual-useness ratio. Most prosaic alignment techniques are going to have some amount of timeline-shorteningness to them, and some amount of success-chance-improvement in them. You don’t want to round that off to a boolean.
Eg, I’ve had arguments about whether RLHF is more like an alignment technique, or more like a capabilities technique, and how that should affect our view of OpenAI. My view is that it’s both, and that the criticism of it is that its acceleration-to-alignment ratio isn’t high enough, because it helps a lot with the kind of alignment that’s necessary for commercialization, but doesn’t help that much with the kind of alignment that’s necessary for preventing a superintelligence from killing everyone.
I think it’s pretty important to get alignment-cautious people into the organizations that are developing and deploying AIs, and I think a lot of prosaic alignment techniques with good ratios may be easier to research from inside an AI lab than outside of one. But, the default outcome of sending people to “work on alignment inside of OpenAI” is that they try to find something with a good dual-useness ratio, fail, and work on something with a ratio that’s just barely good enough that they can tell a positive story about it to their acquaintances.
I don’t expect that problem to ever be straightforward. But, if you’re trying to who should go work on alignment inside of AI labs, as opposed to who should work in orgs that don’t train language models or work in a different field, I think you can get a lot of mileage out of one key distinction: Some people get wedded to one big idea, other people have lots of ideas and are more willing to drop them. I expect the latter sort of person to wind up working on ideas with much better dual-useness ratios, because they have flexibility that they can use to select for that.
Dual-Useness is a Ratio
A lot of AI-risk-concerned people are struggling with how to relate to dual-use research, and relatedly, to doing alignment research inside of AI orgs. There’s a pretty simple concept that seems, to me, to be key to thinking about this coherently: the dual-useness ratio. Most prosaic alignment techniques are going to have some amount of timeline-shorteningness to them, and some amount of success-chance-improvement in them. You don’t want to round that off to a boolean.
Eg, I’ve had arguments about whether RLHF is more like an alignment technique, or more like a capabilities technique, and how that should affect our view of OpenAI. My view is that it’s both, and that the criticism of it is that its acceleration-to-alignment ratio isn’t high enough, because it helps a lot with the kind of alignment that’s necessary for commercialization, but doesn’t help that much with the kind of alignment that’s necessary for preventing a superintelligence from killing everyone.
I think it’s pretty important to get alignment-cautious people into the organizations that are developing and deploying AIs, and I think a lot of prosaic alignment techniques with good ratios may be easier to research from inside an AI lab than outside of one. But, the default outcome of sending people to “work on alignment inside of OpenAI” is that they try to find something with a good dual-useness ratio, fail, and work on something with a ratio that’s just barely good enough that they can tell a positive story about it to their acquaintances.
I don’t expect that problem to ever be straightforward. But, if you’re trying to who should go work on alignment inside of AI labs, as opposed to who should work in orgs that don’t train language models or work in a different field, I think you can get a lot of mileage out of one key distinction: Some people get wedded to one big idea, other people have lots of ideas and are more willing to drop them. I expect the latter sort of person to wind up working on ideas with much better dual-useness ratios, because they have flexibility that they can use to select for that.