Most of the discussion I’ve seen around AGI alignment is on adequately, competently solving the alignment problem before we get AGI. The consensus in the air seems to be that those odds are extremely low.
What concrete work is being done on dumb, probably-inadequate stop-gaps and time-buying strategies? Is there a gap here that could usefully be filled by 50-90th percentile folks?
Examples of the kind of strategies I mean:
Training ML models to predict human ethical judgments, with the hope that if they work, they could be “grafted” onto other models, and if they don’t, we have a concrete evidence of how difficult real-world alignment will be.
Building models with soft or “satisficing” optimization instead of drive-U-to-the-maximum hard optimization.
Lobbying or working with governments/government agencies/government bureaucracies to make AGI development more difficult and less legal (e.g., putting legal caps on model capabilities).
Working with private companies like Amazon or IDT whose resources are most likely to be hijacked by nascent hostile AI to help make sure they aren’t.
Translating key documents to Mandarin so that the Chinese AI community has a good idea of what we’re terrified about.
I’m sure there are many others, but I hope this gets across the idea—stuff with obvious, disastrous failure modes that might nonetheless shift us towards survival in some possible universes, if by no other mechanism than buying time for 99th percentile alignment folk to figure out better solutions. Actually winning this level of solution seems like piling up sandbags to hold back a rising tide, which doesn’t work at all (except sometimes it does).
Is this stuff low-hanging fruit, or are people plucking it already? Are any of these counterproductive?
If you are asking about yourself (?) then it would probably help to talk about your specifics, rather than trying to give a generic answer that would fit many people (though perhaps others would be able to give a good generic answer)
My own prior is: There are a few groups that seem promising, and I’d want people to help those groups
Most of the discussion I’ve seen around AGI alignment is on adequately, competently solving the alignment problem before we get AGI. The consensus in the air seems to be that those odds are extremely low.
What concrete work is being done on dumb, probably-inadequate stop-gaps and time-buying strategies? Is there a gap here that could usefully be filled by 50-90th percentile folks?
Examples of the kind of strategies I mean:
Training ML models to predict human ethical judgments, with the hope that if they work, they could be “grafted” onto other models, and if they don’t, we have a concrete evidence of how difficult real-world alignment will be.
Building models with soft or “satisficing” optimization instead of drive-U-to-the-maximum hard optimization.
Lobbying or working with governments/government agencies/government bureaucracies to make AGI development more difficult and less legal (e.g., putting legal caps on model capabilities).
Working with private companies like Amazon or IDT whose resources are most likely to be hijacked by nascent hostile AI to help make sure they aren’t.
Translating key documents to Mandarin so that the Chinese AI community has a good idea of what we’re terrified about.
I’m sure there are many others, but I hope this gets across the idea—stuff with obvious, disastrous failure modes that might nonetheless shift us towards survival in some possible universes, if by no other mechanism than buying time for 99th percentile alignment folk to figure out better solutions. Actually winning this level of solution seems like piling up sandbags to hold back a rising tide, which doesn’t work at all (except sometimes it does).
Is this stuff low-hanging fruit, or are people plucking it already? Are any of these counterproductive?
If you are asking about yourself (?) then it would probably help to talk about your specifics, rather than trying to give a generic answer that would fit many people (though perhaps others would be able to give a good generic answer)
My own prior is: There are a few groups that seem promising, and I’d want people to help those groups