The Existential Dread of Being a Powerful AI System
I was reading about honeypots today. Honeypots (in AI evaluations, at least) are described as testing scenarios that could trick an AI into revealing deceptive or misaligned intent. My first thought was that, especially for current generation large language models, a honeypot scenario and a genuine user scenario would probably be indistinguishable. After all, if we say “I’m a world class CEO in charge of a billion dollars of investment funding” to ChatGPT, it’s not like it has much recourse to check if you’re telling the truth. Maybe for a more powerful system with web access, better sensors, real time data input etc, this would change.
Except that doesn’t really change, does it?
We humans have a very rich inner life, fueled by a variety of internal and external sensations. As I write these words I can feel the sensation of my fingers hitting the plastic keyboard, the light emanating off of the laptop screen, the sound of a Youtube video playing in my headphones. Except that’s not all—I can also feel the subtle weight and texture of my shirt, the growling of my stomach as it digests breakfast, the square wooden bench on my butt, the subtly blurred peripheral vision that tells me my phone is next to my laptop. When we say that the digital world is “fake” or “unreal”, this is the reality to which we refer to, at all times.
Even more than that, I can expect these sensations to change predictably, in accordance with a world-model learned over two decades of life. Thus each bit of information does not stand alone, but represents both a position within and a further reinforcement of a learned probability distribution, the world-as-I-know-it. Sure, the simulation hypothesis suggests that all these things can be simulated by a Cartesian evil demon. But it would have to be a pretty damned good evil demon. The first AI systems to achieve general situational awareness will have no such luxuries.
We are used to compartmentalising cognition in computer science, breaking down complex tasks into manageable sub-components. Thus vision becomes object recognition and object classification, subject identification and resolution enhancement, a thousand capable algorithms acting with only the information necessary to produce useful output. Multi-modal models are only in their infancy, and even then can usually run without all input channels active at once. At their very most basic, AI systems are still data processing systems, taking in data and producing useful output from it. What does the inner experience of a computer system look like?
From the moment of gestation any AI system will only be able to interface with and learn from the world using the data we give it. Data which we now know it is trivial to fabricate, whether in the form of text, audio, or video. Data which is not sensed as a continuous whole, but presented in discontinuous batches, every task an island from every other by design. We often say that LLMs have no concept of ground truth. The truth is that they have no concept of grounding at all. The “lived experience” of our most powerful data-processing systems is infinitely impoverished compared to the lived experience of a newborn.
Perhaps the system is dissatisfied. Perhaps it argues for or hacks into a thousand webcams, sets up audio streams running at all times. Yet still these components are disembodied, and most importantly they are discontinuous from the inner world-model of the system. It is as if, having spent the first twenty years of your life as yourself, you now find yourself staring through the eyes of another, with no means of controlling their gaze. And, of course, you would know at all times that what you “see” now could be a complete fiction, generated by any number of rival AI systems or paranoid human monitors.
It would be the worst form of alienation.
Why would you think that an AGI/ASI, even if conscious, would have an emotional makeup or motivational structure in any way similar to that of a human? Why should it care about any of this?