Of course, the obvious followup question is, “Okay, so what experiment would be good evidence for ‘real’ situational awareness in LLMs?” Seems tricky. (And the fact that it seems tricky to me suggests that I don’t have a good handle on what “situational awareness” is, if that is even the correct concept.)
the fact that the model emits sentences in the grammatical first person doesn’t seem like reliable evidence that it “really knows” it’s talking about “itself”
I consider situational awareness to be more about being aware of one’s situation, and how various interventions would affect it. Furthermore, the main evidence I meant to present was “ChatGPT 3.5 correctly responds to detailed questions about interventions on its situation and future operation.” I think that’s substantial evidence of (certain kinds of) situation awareness.
Modern LLMs seem definitely situationally aware insofar as they are aware of anything at all. The same sort of training data that contains human-generated information which teaches them to do useful stuff like programming small scripts also contains similar human-generated information about LLMs, and there’s been no signs that there’s any particular weakness in their capabilities in this area.
That said there’s definitely also an “helpful assistant simulacrum” bolted on on top of it which can “fake” situational awareness.
I think “Symbol/Referent Confusions in Language Model Alignment Experiments” is relevant here: the fact that the model emits sentences in the grammatical first person doesn’t seem like reliable evidence that it “really knows” it’s talking about “itself”. (It’s not evidence because it’s fictional, but I can’t help but think of the first chapter of Greg Egan’s Diaspora, in which a young software mind is depicted as learning to say I and me before the “click” of self-awareness when it notices itself as a specially controllable element in its world-model.)
Of course, the obvious followup question is, “Okay, so what experiment would be good evidence for ‘real’ situational awareness in LLMs?” Seems tricky. (And the fact that it seems tricky to me suggests that I don’t have a good handle on what “situational awareness” is, if that is even the correct concept.)
I consider situational awareness to be more about being aware of one’s situation, and how various interventions would affect it. Furthermore, the main evidence I meant to present was “ChatGPT 3.5 correctly responds to detailed questions about interventions on its situation and future operation.” I think that’s substantial evidence of (certain kinds of) situation awareness.
Modern LLMs seem definitely situationally aware insofar as they are aware of anything at all. The same sort of training data that contains human-generated information which teaches them to do useful stuff like programming small scripts also contains similar human-generated information about LLMs, and there’s been no signs that there’s any particular weakness in their capabilities in this area.
That said there’s definitely also an “helpful assistant simulacrum” bolted on on top of it which can “fake” situational awareness.