If it is ‘a randomly generated AGI meeting the safety standard is no more dangerous than a randomly selected human intelligence’, this proposal looks intended to guarantee no worse performance than an average case human alignment.
Not sure it hits that target, but it looks like it’s aiming for it. I understand your argument to be that the worst case AI alignment in the scheme could be as bad as the worst human amplified, and that you have no way of assessing the average or worst case alignments prior to firing up the machine.
The HI (Human Intelligence) alignment/safety problem is presently unsolved, in that it is impossible to predict the future alignment of a specific human with absolute certainty. This is awkward for many industries that require high reliability humans. I have long suspected that the AGI alignment problem will ultimately reduce to a case of the HI alignment problem (take a HI, give it infinite capability to both act on the world and hide its actions along with instantaneous cognition, now you have a AGI-equivalent).
The default solution is, per a paper I’ve seen about the IQ based communications barrier (no citation handy) is essentially ‘humans reflexively mistrust other humans with +30 more IQ points’.
The challenges created by the possibility of a misaligned HI can obviously be solved locally by the model implemented in Equatorial Guinea in the 1970s: https://en.m.wikipedia.org/wiki/Francisco_Macías_Nguema but this denies us the benefits of their potential outputs.
If you could snap your fingers and build an AGI tomorrow, knowing the state of alignment research, would you do it?
How risk tolerant are you relative to other humans who could enable emergence of AGI?
What standard is your baseline for a safe AGI?
If it is ‘a randomly generated AGI meeting the safety standard is no more dangerous than a randomly selected human intelligence’, this proposal looks intended to guarantee no worse performance than an average case human alignment.
Not sure it hits that target, but it looks like it’s aiming for it. I understand your argument to be that the worst case AI alignment in the scheme could be as bad as the worst human amplified, and that you have no way of assessing the average or worst case alignments prior to firing up the machine.
The HI (Human Intelligence) alignment/safety problem is presently unsolved, in that it is impossible to predict the future alignment of a specific human with absolute certainty. This is awkward for many industries that require high reliability humans. I have long suspected that the AGI alignment problem will ultimately reduce to a case of the HI alignment problem (take a HI, give it infinite capability to both act on the world and hide its actions along with instantaneous cognition, now you have a AGI-equivalent).
The default solution is, per a paper I’ve seen about the IQ based communications barrier (no citation handy) is essentially ‘humans reflexively mistrust other humans with +30 more IQ points’.
The challenges created by the possibility of a misaligned HI can obviously be solved locally by the model implemented in Equatorial Guinea in the 1970s: https://en.m.wikipedia.org/wiki/Francisco_Macías_Nguema but this denies us the benefits of their potential outputs.
If you could snap your fingers and build an AGI tomorrow, knowing the state of alignment research, would you do it?
How risk tolerant are you relative to other humans who could enable emergence of AGI?