I thought this is what the “Shoggoth” metaphor for LLMs and AI assistants is pointing at: When reasoning about nonhuman minds, we employ intuitions that we’d evolved to think about fellow humans. Consequently, many arguments against AI x-risk from superintelligent agents employ intuitions that route through human-flavored concepts like kindness, altruism, reciprocity, etc.
The strength or weakness of those kinds of arguments depends on the extent to which the superintelligent agent uses or thinks in those human concepts. But those concepts arose in humans through the process of evolution, which is very different from how ML-based AIs are designed. Therefore there’s no prima facie reason to expect that a superintelligent AGI, designed with a very different mind architecture, would employ those human concepts. And so those aforementioned intuitions that argue against x-risk are unconvincing.
For example, if I ask an AI assistant to respond as if it’s Abraham Lincoln, then human concepts like kindness are not good predictors for how the AI assistant will respond, because it’s not actually Abraham Lincoln, it’s more like a Shoggoth pretending to be Abraham Lincoln.
In contrast, if we encountered aliens, those would’ve presumably arisen from evolution, in which case their mind architectures would be closer to us than an artificially designed AGI, and this would make our intuitions comparatively more applicable. Although that wouldn’t suffice for value alignment with humanity. Related fiction: EY’s Three Worlds Collide.
if I ask an AI assistant to respond as if it’s Abraham Lincoln, then human concepts like kindness are not good predictors for how the AI assistant will respond, because it’s not actually Abraham Lincoln, it’s more like a Shoggoth pretending to be Abraham Lincoln.
Somewhat disagree here—while we can’t use kindness to predict the internal “thought process” of the AI, [if we assume it’s not actively disobedient] the instructions mean that it will use an internal lossy model of what humans mean by kindness, and incorporate that into its act. Similar to how a talented human actor can realistically play a serial killer without having a “true” understanding of the urge to serially-kill people irl.
That’s a fair rebuttal. The actor analogy seems good: an actor will behave more or less like Abraham Lincoln in some situations, and very differently in others: e.g. on movie set vs. off movie set, vs. being with family, vs. being detained by police.
Similarly, the shoggoth will output similar tokens to Abraham Lincoln in some situations, and very different ones in others: e.g. in-distribution requests of famous Abraham Lincoln speeches, vs. out-of-distribution requests like asking for Abraham Lincoln’s opinions on 21st century art, vs. requests which invoke LLM token glitches like SolidGoldMagikarp, vs. unallowed requests that are denied by company policy & thus receive some boilerplate corporate response.
I thought this is what the “Shoggoth” metaphor for LLMs and AI assistants is pointing at: When reasoning about nonhuman minds, we employ intuitions that we’d evolved to think about fellow humans. Consequently, many arguments against AI x-risk from superintelligent agents employ intuitions that route through human-flavored concepts like kindness, altruism, reciprocity, etc.
The strength or weakness of those kinds of arguments depends on the extent to which the superintelligent agent uses or thinks in those human concepts. But those concepts arose in humans through the process of evolution, which is very different from how ML-based AIs are designed. Therefore there’s no prima facie reason to expect that a superintelligent AGI, designed with a very different mind architecture, would employ those human concepts. And so those aforementioned intuitions that argue against x-risk are unconvincing.
For example, if I ask an AI assistant to respond as if it’s Abraham Lincoln, then human concepts like kindness are not good predictors for how the AI assistant will respond, because it’s not actually Abraham Lincoln, it’s more like a Shoggoth pretending to be Abraham Lincoln.
In contrast, if we encountered aliens, those would’ve presumably arisen from evolution, in which case their mind architectures would be closer to us than an artificially designed AGI, and this would make our intuitions comparatively more applicable. Although that wouldn’t suffice for value alignment with humanity. Related fiction: EY’s Three Worlds Collide.
Somewhat disagree here—while we can’t use kindness to predict the internal “thought process” of the AI, [if we assume it’s not actively disobedient] the instructions mean that it will use an internal lossy model of what humans mean by kindness, and incorporate that into its act. Similar to how a talented human actor can realistically play a serial killer without having a “true” understanding of the urge to serially-kill people irl.
That’s a fair rebuttal. The actor analogy seems good: an actor will behave more or less like Abraham Lincoln in some situations, and very differently in others: e.g. on movie set vs. off movie set, vs. being with family, vs. being detained by police.
Similarly, the shoggoth will output similar tokens to Abraham Lincoln in some situations, and very different ones in others: e.g. in-distribution requests of famous Abraham Lincoln speeches, vs. out-of-distribution requests like asking for Abraham Lincoln’s opinions on 21st century art, vs. requests which invoke LLM token glitches like SolidGoldMagikarp, vs. unallowed requests that are denied by company policy & thus receive some boilerplate corporate response.