I was surprised by Nate’s high confidence in Unconscious Meh given misaligned ASI. Other people also seem to be quite confident in the same way. In contrast, my own ass-numbers for {the misaligned ASI scenario} are something like
10% Conscious Meh,
60% Unconscious Meh,
30% Weak Dystopia.
(And it would be closer to 50-50 between Unconscious Meh and Weak Dystopia, before I take into account others’ views.)
In a lossy nutshell, my reasons for the relatively high Weak Dystopia probabilities are something like
many approaches to training AGI currently seem to have as a training target something like “learn to predict humans”, or some other objective that is humanly-meaningful but not-our-real-values,
plus Goodhart’s law.
I’m very curious about why people have high confidence in {Unconscious Meh given misaligned ASI}, and why people seem to assign such low probabilities to {(Weak) Dystopia given misaligned ASI}.
many approaches to training AGI currently seem to have as a training target something like “learn to predict humans”, or some other objective that is humanly-meaningful but not-our-real-values,
I don’t know whether this will continue in the future (all the way up to AGI). If it does, then it strikes me as a sufficiently coarse-grained approach (that’s bad enough at inner alignment, and bad enough at outer-alignment-to-specific-things-we-actually-care-about) that I’d still be pretty surprised if the result (in the limit of superintelligence) bears any resemblance to stuff we care much about, good or bad.
E.g., there are many more “unconscious configurations of matter that bear some relation to things you learn in trying to predict humans” than there are “conscious configurations of matter that bear some relation to things you learn in trying to predict humans”. Building an entire functioning conscious mind is still a very complicated end-state that requires getting lots of bits into the AGI’s terminal goals correctly; it doesn’t necessarily become that much easier just because we’re calling the ability we’re training “human prediction”. Like, a superintelligent paperclipper would also be excellent at the human prediction task, given access to information about humans.
(I’ll also mention that I think it’s a terrible idea for safety-conscious AI researchers to put all their eggs in the “train AI via lots of data on humans” basket. But that’s a separate question from what AI researchers are likely to do in practice.)
I was surprised by Nate’s high confidence in Unconscious Meh given misaligned ASI. Other people also seem to be quite confident in the same way. In contrast, my own ass-numbers for {the misaligned ASI scenario} are something like
10% Conscious Meh,
60% Unconscious Meh,
30% Weak Dystopia.
(And it would be closer to 50-50 between Unconscious Meh and Weak Dystopia, before I take into account others’ views.)
In a lossy nutshell, my reasons for the relatively high Weak Dystopia probabilities are something like
many approaches to training AGI currently seem to have as a training target something like “learn to predict humans”, or some other objective that is humanly-meaningful but not-our-real-values,
plus Goodhart’s law.
I’m very curious about why people have high confidence in {Unconscious Meh given misaligned ASI}, and why people seem to assign such low probabilities to {(Weak) Dystopia given misaligned ASI}.
I don’t know whether this will continue in the future (all the way up to AGI). If it does, then it strikes me as a sufficiently coarse-grained approach (that’s bad enough at inner alignment, and bad enough at outer-alignment-to-specific-things-we-actually-care-about) that I’d still be pretty surprised if the result (in the limit of superintelligence) bears any resemblance to stuff we care much about, good or bad.
E.g., there are many more “unconscious configurations of matter that bear some relation to things you learn in trying to predict humans” than there are “conscious configurations of matter that bear some relation to things you learn in trying to predict humans”. Building an entire functioning conscious mind is still a very complicated end-state that requires getting lots of bits into the AGI’s terminal goals correctly; it doesn’t necessarily become that much easier just because we’re calling the ability we’re training “human prediction”. Like, a superintelligent paperclipper would also be excellent at the human prediction task, given access to information about humans.
(I’ll also mention that I think it’s a terrible idea for safety-conscious AI researchers to put all their eggs in the “train AI via lots of data on humans” basket. But that’s a separate question from what AI researchers are likely to do in practice.)