A couple of notes from me as, though I appreciate the effort you’ve put into it, especially the simulations, I overall disagree with the reasoning you’ve presented so I thought I’d offer a few counter-points.
Whilst I don’t disagree that “idiotic AGI” is conceptually possible I think the main disagreement we have is that you believe that AGI will sample from a sufficiently large pool, similar to that of high IQ people in the world today, so that we will be guaranteed at least a few “idiotic AGI” to tip us off. I think this assumption rests centrally on a world where either AGI is developed relatively simultaneously by a large number of different actors OR it is widely shared once it is developed so that many such different AGIs might co-exist in a relatively short timespan. I have serious doubts that that is indeed the timeline we are heading towards.
It is perfectly possible that when AGI is eventually developed it remains a singular (or single-digit count) guarded secret for a significant amount of time for example. If the AGI that happens to be developed turns out to not be an “idiotic AGI”, then we have an immediate problem. Even if the AGI that is developed does turn out to be an “idiotic AGI” and it displays serious errors in testing, it’s entirely possible these will be “mitigated”, again in secret, and thus a far more capable and less prone to “idiocy” AGI will be eventually released into the world, one that is equally far more capable of carrying out an existential attack OR of simply putting this off until it is an ASI and is far more capable.
I’d note also that you state quite clearly towards the beginning of the post that you are “not making any claims about the prevalence of foolish beliefs among the highly intelligent” and yet in other places you state that “there are many humans, even highly intelligent ones, that act dumb or foolish” and that “foolish high-IQ people exist and are not vanishingly rare”. Either you are claiming simply the existence of the phenomenon or you are claiming you can demonstrate prevalence. I don’t feel like you’ve successfully demonstrated the latter, having offered only some fairly insubstantial evidence, so I will assume that the former is the statement you actually want to make. Prevalence is however quite essential to the argument you are making, I think—it does matter whether it’s 3 or 30 out of 100 high-IQ people that are “foolish”.
There is also a discussion to be had in relation to equating a “high-IQ” human to an AGI. The definition of AGI is still highly problematic so I think we’re on pretty shaky ground assuming what an AGI will and won’t be anyway and that in itself may be a weakness in your argument.
I think however that if we are to follow your line of reasoning of “foolish humans”, a lot of the errors that humans in general (high-IQ or not) make are due to a combination of emotion and cognitive biases.
AGI will (presumably) not make any errors due to emotion and it is highly debatable what cognitive biases (if any) an AGI will have. We are certainly introducing a bunch with RLHF as you yourself mentioned, though whether that technique will (still) be used when AGI is achieved is another tenuous assumption. Whilst you argue that hallucinations might be themselves such an example of a “cognitive bias” that may give away the “idiot AGI’s” plan, it’s worth noting that the elimination of hallucinations is a direct goal of current AI improvement and whilst perhaps we cannot expect the complete elimination of hallucinations, as long as they are reduced to extremely small odds of occurring, expecting hallucinations to be source of a “giveaway” alarm from an AGI is highly optimistic if not dare I say unrealistic.
quasi_quasar
Karma: 3
To add to bogdanb’s comment below, you might want to be careful because you seem to be ‘forcing’ people to subscribe to promotional newsletters in order to get a price quote which, aside from being quite a nasty thing to do, is also a blatant violation of European GDPR regulations for which you could receive a hefty fine