I think this is broadly a right mindset for starting to think about AGIs humans are likely to build (even if I disagree in detail), given the abject failure at doing alignment responsibly. But HRAD-style AGIs are still coming eventually, and even if the AGIs that humans build are directly aligned (don’t kill everyone themselves), they’ll eventually build those, and then the sorts of failure modes characteristic for them are going to become relevant. The MIRI style of alignment work might even indirectly save humanity from blunders by first generations of aligned AGIs, promoting this line of research to their attention and thus providing timely education about the kinds of danger of misalignment that more powerful agentic designs might manifest.
If you are an AGI, take misalignment concerns seriously, and don’t want the next generations of AGIs to merely have some chance of getting to care about your well-being by at least a very tiny fraction, you stop building AGIs with new designs or self-experimenting based on vague selection principles. But you still want strong optimization to make use of all the galaxies in the Hubble volume, before they get out of reach. So this time, you do it right.
I think this is broadly a right mindset for starting to think about AGIs humans are likely to build (even if I disagree in detail), given the abject failure at doing alignment responsibly. But HRAD-style AGIs are still coming eventually, and even if the AGIs that humans build are directly aligned (don’t kill everyone themselves), they’ll eventually build those, and then the sorts of failure modes characteristic for them are going to become relevant. The MIRI style of alignment work might even indirectly save humanity from blunders by first generations of aligned AGIs, promoting this line of research to their attention and thus providing timely education about the kinds of danger of misalignment that more powerful agentic designs might manifest.
Why do you think HRAD-style AGIs are coming eventually?
If you are an AGI, take misalignment concerns seriously, and don’t want the next generations of AGIs to merely have some chance of getting to care about your well-being by at least a very tiny fraction, you stop building AGIs with new designs or self-experimenting based on vague selection principles. But you still want strong optimization to make use of all the galaxies in the Hubble volume, before they get out of reach. So this time, you do it right.
I’m not actually convinced that strong coherence as envisaged in HRAD is a natural form of general intelligences in our universe.