I’d like to see justification of “under what conditions does speculation about ‘superintelligent consequentialism’ merit research attention at all?” and “why do we think ‘future architectures’ will have property X, or whatever?!”.
One of my mental models for alignment work is “contingency planning”. There are a lot of different ways AI research could go. Some might be dangerous. Others less so. If we can forecast possible dangers in advance, we can try to steer towards safer designs, and generate contingency plans with measures to take if a particular forecast for AI development ends up being correct.
The risk here is “person with a hammer” syndrome, where people try to apply mental models from thinking about superintelligent consequentialists to other AI systems in a tortured way, smashing round pegs into square holes. I wish people would look at the territory more, and do a little bit more blue sky security thinking about unknown unknowns, instead of endlessly trying to apply the classic arguments even when they don’t really apply.
A specific research proposal would be: Develop a big taxonomy or typology of how AGI could work by identifying the cruxes researchers have, then for each entry in your typology, give it an estimated safety rating, try to identify novel considerations which apply to it, and also summarize the alignment proposals which are most promising for that particular entry.
One of my mental models for alignment work is “contingency planning”. There are a lot of different ways AI research could go. Some might be dangerous. Others less so. If we can forecast possible dangers in advance, we can try to steer towards safer designs, and generate contingency plans with measures to take if a particular forecast for AI development ends up being correct.
The risk here is “person with a hammer” syndrome, where people try to apply mental models from thinking about superintelligent consequentialists to other AI systems in a tortured way, smashing round pegs into square holes. I wish people would look at the territory more, and do a little bit more blue sky security thinking about unknown unknowns, instead of endlessly trying to apply the classic arguments even when they don’t really apply.
A specific research proposal would be: Develop a big taxonomy or typology of how AGI could work by identifying the cruxes researchers have, then for each entry in your typology, give it an estimated safety rating, try to identify novel considerations which apply to it, and also summarize the alignment proposals which are most promising for that particular entry.