Though I’m almost tempted to think of LLMs as being like people who are LARPing or who have impostor syndrome. As in, they spend pretty much all their cognitive capacity on obsessing over doing what they feel looks normal. (This also closely aligns with how they are trained: first they are made to mimic what other people do, and then they are made to mimic what gets praise and avoid what gets critique.) Probably humanizes them even more than your friendly creature proposal.
This sounds somewhat similar to deceptive alignment, so I want to draw a distinction here: It’s not that LARPers/impostors are trying to maximize approval in a consequentialist sense (as this would require modelling how their actions ripple out into the world, which they do not do), but rather that (in the sense described by shard theory) they are molded based on normality and approval. As such they would not do something abnormal/disapproved-of in order to look more normal/approval-worthy later.
Nice point.
Though I’m almost tempted to think of LLMs as being like people who are LARPing or who have impostor syndrome. As in, they spend pretty much all their cognitive capacity on obsessing over doing what they feel looks normal. (This also closely aligns with how they are trained: first they are made to mimic what other people do, and then they are made to mimic what gets praise and avoid what gets critique.) Probably humanizes them even more than your friendly creature proposal.
This sounds somewhat similar to deceptive alignment, so I want to draw a distinction here: It’s not that LARPers/impostors are trying to maximize approval in a consequentialist sense (as this would require modelling how their actions ripple out into the world, which they do not do), but rather that (in the sense described by shard theory) they are molded based on normality and approval. As such they would not do something abnormal/disapproved-of in order to look more normal/approval-worthy later.