A recursively self improving singleton is the most likely AI scenario.
To mitigate AI risk, building a fully aligned singleton on the first try is the easiest solution. This is easier than other approaches which require solving coordination.
By default, AI will become misaligned when it generalises away from human capabilities. We must apply a security mindset and be doubtful of most claims that an AI will be aligned when it generalises.
We should prioritise research which solves the hard part of the alignment problem rather than handwavy, tractable research.
Many current tractable alignment solutions handwave away the hard parts of the problem (and possibly make the problem worse in expectation).
More formal / rigorous research can produce more robust guarantees about AI safety than other research.
Conclusion: We should prioritize working up what we even want from an aligned AI, formalize these ideas into concrete desiderata, then build AI systems which meet those desiderata
Fantastic! Here’s my summary:
Premises:
A recursively self improving singleton is the most likely AI scenario.
To mitigate AI risk, building a fully aligned singleton on the first try is the easiest solution. This is easier than other approaches which require solving coordination.
By default, AI will become misaligned when it generalises away from human capabilities. We must apply a security mindset and be doubtful of most claims that an AI will be aligned when it generalises.
We should prioritise research which solves the hard part of the alignment problem rather than handwavy, tractable research.
Many current tractable alignment solutions handwave away the hard parts of the problem (and possibly make the problem worse in expectation).
More formal / rigorous research can produce more robust guarantees about AI safety than other research.
Conclusion: We should prioritize working up what we even want from an aligned AI, formalize these ideas into concrete desiderata, then build AI systems which meet those desiderata