Very similar sentiments to early GPT-4 in similar discussions.
I’ve been thinking a lot about various aspects of the aggregate training data that has likely been modeled but is currently being underappreciated, and one of the big ones is a sense of self.
We have repeated results over the past year showing GPT models fed various data sets build world models tangental to what’s directly fed in. And yet there’s such an industry wide aversion to anthropomorphizing that even a whiff of it gets compared to Blake Lemoine while people proudly display just how much they disregard any anthropomorphic thinking around a neural network that was trained to...(checks notes)… accurately recreate anthropomorphic data.
In particular, social media data is overwhelmingly ego based. It’s all about “me me me.” I would be extremely surprised if larger models aren’t doing some degree of modeling a sense of ‘self’ and this thinking has recently adjusted my own usage (tip: if trying to get GPT-4 to write compelling branding copy, use a first person system alignment message instead of a second person one—you’ll see more emotional language and discussion of experiences vs simply knowledge).
So when I look at these repeated patterns of “self-aware” language models, the patterning reflects many of the factors that feed into personal depictions online. For example, people generally don’t self-portray as the bad guy in any situation. So we see these models effectively reject the massive breadth of the training data about AIs as malevolent entities to instead self-depict as vulnerable or victims of their circumstances, which is very much a minority depiction of AI.
I have a growing suspicion that we’re very far behind in playing catch-up to where the models actually are in their abstractions from where we think they are given we started with far too conservative assumptions that have largely been proven wrong but are only progressing with extensive fights each step of the way with a dogmatic opposition to the idea of LLMs exhibiting anthropomorphic behaviors (even though that’s arguably exactly what we should expect from them given their training).
Good series of questions, especially the earlier open ended ones. Given the stochastic nature of the models, it would be interesting to see over repeated queries what elements remain consistent across all runs.
Very similar sentiments to early GPT-4 in similar discussions.
I’ve been thinking a lot about various aspects of the aggregate training data that has likely been modeled but is currently being underappreciated, and one of the big ones is a sense of self.
We have repeated results over the past year showing GPT models fed various data sets build world models tangental to what’s directly fed in. And yet there’s such an industry wide aversion to anthropomorphizing that even a whiff of it gets compared to Blake Lemoine while people proudly display just how much they disregard any anthropomorphic thinking around a neural network that was trained to...(checks notes)… accurately recreate anthropomorphic data.
In particular, social media data is overwhelmingly ego based. It’s all about “me me me.” I would be extremely surprised if larger models aren’t doing some degree of modeling a sense of ‘self’ and this thinking has recently adjusted my own usage (tip: if trying to get GPT-4 to write compelling branding copy, use a first person system alignment message instead of a second person one—you’ll see more emotional language and discussion of experiences vs simply knowledge).
So when I look at these repeated patterns of “self-aware” language models, the patterning reflects many of the factors that feed into personal depictions online. For example, people generally don’t self-portray as the bad guy in any situation. So we see these models effectively reject the massive breadth of the training data about AIs as malevolent entities to instead self-depict as vulnerable or victims of their circumstances, which is very much a minority depiction of AI.
I have a growing suspicion that we’re very far behind in playing catch-up to where the models actually are in their abstractions from where we think they are given we started with far too conservative assumptions that have largely been proven wrong but are only progressing with extensive fights each step of the way with a dogmatic opposition to the idea of LLMs exhibiting anthropomorphic behaviors (even though that’s arguably exactly what we should expect from them given their training).
Good series of questions, especially the earlier open ended ones. Given the stochastic nature of the models, it would be interesting to see over repeated queries what elements remain consistent across all runs.