Great post. I think this type of strategic thinking is too rare in alignment and other academic disciplines.
I’d just modify your bottom line a little bit. The goal isn’t quite to “solve alignment as quickly as possible”. The goal is to maximize the odds that we’ve solved alignment in time to prevent human disempowerment. That’s importantly different.
It means solving alignment for the first type of AGI we build, before it’s deployed.
Having an alignment solution that applies to some type of AGI nobody is building doesn’t help. Yet a lot of otherwise brilliant alignment work goes in that direction, and IMO gets that big zero impact multiplier.
I think if you could demonstrably “solve alignment” for any architecture, you’d have a decent chance of convincing people to build it as fast as possible, in lieu of other avenues they had been pursuing.
Some people. But it would depend what the prospects were for that type of AGI. Because I don’t think you could convince everyone else to stop working on other types of AGI. So it would be a race between the new “more alignable” type and the currently-leading types. If the “more alignable” type seemed guaranteed to lose that race, I’m not sure many people would even try building it.
Great post. I think this type of strategic thinking is too rare in alignment and other academic disciplines.
I’d just modify your bottom line a little bit. The goal isn’t quite to “solve alignment as quickly as possible”. The goal is to maximize the odds that we’ve solved alignment in time to prevent human disempowerment. That’s importantly different.
It means solving alignment for the first type of AGI we build, before it’s deployed.
Having an alignment solution that applies to some type of AGI nobody is building doesn’t help. Yet a lot of otherwise brilliant alignment work goes in that direction, and IMO gets that big zero impact multiplier.
I think if you could demonstrably “solve alignment” for any architecture, you’d have a decent chance of convincing people to build it as fast as possible, in lieu of other avenues they had been pursuing.
Some people. But it would depend what the prospects were for that type of AGI. Because I don’t think you could convince everyone else to stop working on other types of AGI. So it would be a race between the new “more alignable” type and the currently-leading types. If the “more alignable” type seemed guaranteed to lose that race, I’m not sure many people would even try building it.