Wikipedia defines self-awareness as “the capacity for introspection and the ability to recognize oneself as an individual separate from the environment and other individuals”. At a minimum, this would require GPT-2 to have a model of the world which included a representation of itself. To be similar to our intuitive understanding of self-awareness, that representation would also need to guide its decision-making and thought in some significant way.
Technically, I guess you could say that, if a Transformer architecture was trained on texts which talked about Transformer architectures, it would get a model which did include a representation of itself. But that would be just another data token, which the system gave no special significance to, and which wouldn’t guide its behavior any more than any other piece of data.
huh, interesting artifact here, I wonder what mechanism update should be made to correct the thinking that generated this mistake towards a better fundamentals model; I imagine you’ve already updated the view, but I wonder if there’s some gear in your model that still generates perspectives like this
Nice catch, I’d totally forgotten having written this comment.
I’m not sure if it was actually wrong, though. I assume you mean some of the stuff that seems to imply ChatGPT(-4) having a level of self-awareness? Possibly that could still be explained by a combination of “a Transformer architecture trained on texts which talked about Transformer architectures” and RLHF/fine-tuning having made ChatGPT specifically refer to itself.
I think the only error is to notice that the claim that GPT2 couldn’t have self-awareness (I agree, it still can’t) was incorrectly overgeneralized to also claim that the self-representation would be weak for all LM transformers. You did say the “technically,” so maybe no update is needed.
It’s probably not important. I just happened to be archive browsing for a couple minutes for an unrelated reason and ran across it.
This doesn’t seem to prevent the idea of current algorithms having Qualia, does it? It can be aware of an experience its’ having, even if its’ not aware that it is the one having the experience, or able to model the architecture that would create its’ experience. To me this may fit some colloquial definition of primitive self-awareness.
That’s possible. Or perhaps just a different theory of qualia (of which we don’t have a clear winner right now).
In my theory of experience, there are animals that have experiences of pain without the ability to model themselves, and for instance would react to a wound but not pass the mirror test.
would react to a wound but not pass the mirror test
I mean, reacting to a wound doesn’t demonstrate that they’re actually experiencing pain. If experiencing pain actually requires self-awareness, then an animal could be perfectly capable of avoiding damaging stimuli without actually feeling pain from said stimuli. I’m not saying that’s actually how it works, I’m just saying that reacting to wounds doesn’t demonstrate what you want it to demonstrate.
More generally this is tied to an argument for why deep learning is unlikely to lead to AGI: these methods lack enough feedback to enable self-awareness, and that is probably necessary for creating a general intelligence in that sense that a general intelligence needs to be flexible enough to remake itself to handle new tasks independently and that requires a deep reflective capabilities.
these methods lack enough feedback to enable self-awareness
Although I think this is plausibly the case, I’m far from confident that it’s actually true. Are there any specific limitations you think play a role here?
I think it mostly derives from the separation of training from execution into a separate phase, but also from not giving neutral networks much reflective access to the network. This sometimes exists in limited amounts in some networks, and when it does I’d say those networks are self aware, but only during the training phase, and it’s definitely not clear there’s enough complexity there to see anything like an ability for the network to conceive of itself as an ontological object, so even then it’s not obviously the same kind of self awareness as, say, humans have, but more likely a simpler kind more akin to that of insects.
Yes, we are.
Wikipedia defines self-awareness as “the capacity for introspection and the ability to recognize oneself as an individual separate from the environment and other individuals”. At a minimum, this would require GPT-2 to have a model of the world which included a representation of itself. To be similar to our intuitive understanding of self-awareness, that representation would also need to guide its decision-making and thought in some significant way.
Here is an intuitive explanation of the Transformer architecture that GPT-2 is based on. You can see from the explanation that it’s only modeling language; there’s no self-representation involved.
Technically, I guess you could say that, if a Transformer architecture was trained on texts which talked about Transformer architectures, it would get a model which did include a representation of itself. But that would be just another data token, which the system gave no special significance to, and which wouldn’t guide its behavior any more than any other piece of data.
huh, interesting artifact here, I wonder what mechanism update should be made to correct the thinking that generated this mistake towards a better fundamentals model; I imagine you’ve already updated the view, but I wonder if there’s some gear in your model that still generates perspectives like this
Nice catch, I’d totally forgotten having written this comment.
I’m not sure if it was actually wrong, though. I assume you mean some of the stuff that seems to imply ChatGPT(-4) having a level of self-awareness? Possibly that could still be explained by a combination of “a Transformer architecture trained on texts which talked about Transformer architectures” and RLHF/fine-tuning having made ChatGPT specifically refer to itself.
I think the only error is to notice that the claim that GPT2 couldn’t have self-awareness (I agree, it still can’t) was incorrectly overgeneralized to also claim that the self-representation would be weak for all LM transformers. You did say the “technically,” so maybe no update is needed.
It’s probably not important. I just happened to be archive browsing for a couple minutes for an unrelated reason and ran across it.
This doesn’t seem to prevent the idea of current algorithms having Qualia, does it? It can be aware of an experience its’ having, even if its’ not aware that it is the one having the experience, or able to model the architecture that would create its’ experience. To me this may fit some colloquial definition of primitive self-awareness.
I strongly suspect this sentence is based on a confused understanding of qualia.
That’s possible. Or perhaps just a different theory of qualia (of which we don’t have a clear winner right now).
In my theory of experience, there are animals that have experiences of pain without the ability to model themselves, and for instance would react to a wound but not pass the mirror test.
I mean, reacting to a wound doesn’t demonstrate that they’re actually experiencing pain. If experiencing pain actually requires self-awareness, then an animal could be perfectly capable of avoiding damaging stimuli without actually feeling pain from said stimuli. I’m not saying that’s actually how it works, I’m just saying that reacting to wounds doesn’t demonstrate what you want it to demonstrate.
I agree that’s possible, it’s just also possible the reverse is true.
More generally this is tied to an argument for why deep learning is unlikely to lead to AGI: these methods lack enough feedback to enable self-awareness, and that is probably necessary for creating a general intelligence in that sense that a general intelligence needs to be flexible enough to remake itself to handle new tasks independently and that requires a deep reflective capabilities.
Although I think this is plausibly the case, I’m far from confident that it’s actually true. Are there any specific limitations you think play a role here?
I think it mostly derives from the separation of training from execution into a separate phase, but also from not giving neutral networks much reflective access to the network. This sometimes exists in limited amounts in some networks, and when it does I’d say those networks are self aware, but only during the training phase, and it’s definitely not clear there’s enough complexity there to see anything like an ability for the network to conceive of itself as an ontological object, so even then it’s not obviously the same kind of self awareness as, say, humans have, but more likely a simpler kind more akin to that of insects.