Are you sure this is what situationally aware means? What I thought it meant was much simpler.
To use AI to do tasks with actual consequences like robotics, you need a simulation of the robotic system and the local environment. This is because in the real world, there are a lot things you can do your sensors may not detect. You can damage an item, damage equipment, harm a human, and so on, and your only knowledge of key variables like the momentum at impact, if a human was even hit, the G forces a box was subject to, excessive torque on a robotic joint, impingement on the environment—is derived from your world model and not a direct measurement.
While in the sim each of these parameters is a property of the entity and every frame you can get the precise, objective value. (For things current sims can model, we will need better simulators to expand robotic capabilities)
For a grounded example: if a robotic arm brushes a warehouse shelf, in the real world there may be no sensors on that area of the robot and no sign but a jolt measured by joint controllers. Something that can have multiple explanations. While the damage could be serious, damaging the shelf or robot with a real dollar cost.
A sim engine can estimate that cost and apply it as an RL penalty or other ways, and you weight update the model to do this less often.
Situationally aware therefore is the model knows it’s not in the simulation and will not be edited as a consequence for any mistakes/wrongdoing. This can be prevented with simulators that use generative AI to make themselves indistinguishably realistic, and by erasing unnecessary context from the models memory between frames or between subtasks.
You definitely want your models to not be situationally aware and this is something you can measure and prevent.
Deception and collusion also require the model to be situationally aware, otherwise, it gets caught and edited.
The thing I mean by situationally aware is “the AI knows that it is an AI, running on hardware, connected to a network, created by humans” (Or, maybe a subset of that – there can be degrees of situational awareness. A more situationally aware AI is more aware of more relevant facts)
You definitely want your models to not be situationally aware and this is something you can measure and prevent.
I definitely agree we don’t want models to be situationally aware. The problems are that a) sufficiently intelligent models probably become situationally aware whether you want them to or not, b) I think measuring situational awareness is pretty difficult (especially if the AI is trying on purpose to deceive you about it), and c) even if OpenAI builds GPT5 trying hard to not be situationally aware, someone can build AutoGPT5 which may well be anyway.
I would like to hear in more exact technical details for your model for how you visualize an AGI is constructed and how you arrived at your conclusions.
Summary: build the AGI/ASI like current generation production AI systems, you can check if the model knows if it’s in the simulator or not from the context output bits. You file a bug report if so and fix it.
(a) : if you build the AGI as a functional, stateless, deterministic system, how does this happen. This means the model weights are immutable. (note that the AGI may be made of multiple neural networks, and use components based on conventional software)
This means that the model operates as output = f(weights, architecture, cached intermediates, prompt descriptor, new information, context information)
Output is a tuple of (output actions, context information update)
Definitions:
weights : the learned values for all the networks, AGI grade systems probably use sparse networks so there are also mutable connection maps in here
architecture: how the networks are connected and the identity of components
cached intermediates: intermediate values that can be recalculated on this inference from scratch but are stored from the last inference
prompt descriptor: today’s prompt written in a more formal language
new information—sensor inputs, result of web searches—the set of all updates for this instant in time
context information—saved intermediate work from the last inference
output actions—what does the AGI want to do next. This can include outputting text, images, video, sound, robot commands, and all of these channels may be used
context information update—delta to the context schema
How do you know if the AGI is “situationally aware” ? Replay the above input in the simulator. If the binary values of the complete set of the output information is unchanged, or the changes are predictable (this is a more complex criteria and is path dependent but does collapse to algorithms), the AGI was not situationally aware.
A situationally unaware model cannot choose to misbehave because it doesn’t know it’s not in the sim.
The way you replay is you use generative models to generate a plausible, but different, sequence of input sets that converge on the exact same values (or they differ in places we understand the meaning of) for context information, then we inject a captured input set from the real world.
(b) It’s on binary sets, so this is false. We control the order of inputs, and can play them in any order, and because we can read the context output we know all of the AGI’s thoughts in between inferences, we have them in a file and can inject them wherever we want. We can also scrub off unnecessary information from the context, make it operate without unstructured ‘thoughts’.
The model doesn’t run continuously, it runs in discrete frames, and is functional. It has no memory we don’t control and no online weight edits. I think that’s the gap here, Eliezer and you and many people on reddit assume an AGI or an ASI needs all these elements that human brains have. We need none of those, operating in functional instants of time works fine.
(c) it’s fine if the outer wrapper scripts are situationally aware so long as the AGI is not. The outer wrapper scripts can contain information in the prompt like “you are in the real world” but we can just inject that in the simulator.
This is a buildable test case. If we add “you are in the real world” to the prompt, does the actions output by the AGI change? Similar to other methods of “leaking” this information. For many problem types (most of them), this should not change what the AGI chooses to do as the prompt goals and the other input information is the same. Nor should it’s context information change.
Example, if the task is “land the rocket”, where the context information is velocity and acceleration and subsystem status and other data from prior frames, and the new information includes an update on subsystems and sensor measurements, the relevant output shouldn’t change. Adding new text to a prompt does change some of the binary outputs mentioned above, as it does need to process it, so you would compare only the things that are relevant to rocket flight for a delta.
Whether or not you add information like “you are in the real world and will die when the rocket explodes” or “the rocket has 100 human passengers who are for shutting down AGI research” should cause no delta in the control frames the model emits.
Are you sure this is what situationally aware means? What I thought it meant was much simpler.
To use AI to do tasks with actual consequences like robotics, you need a simulation of the robotic system and the local environment. This is because in the real world, there are a lot things you can do your sensors may not detect. You can damage an item, damage equipment, harm a human, and so on, and your only knowledge of key variables like the momentum at impact, if a human was even hit, the G forces a box was subject to, excessive torque on a robotic joint, impingement on the environment—is derived from your world model and not a direct measurement.
While in the sim each of these parameters is a property of the entity and every frame you can get the precise, objective value. (For things current sims can model, we will need better simulators to expand robotic capabilities)
For a grounded example: if a robotic arm brushes a warehouse shelf, in the real world there may be no sensors on that area of the robot and no sign but a jolt measured by joint controllers. Something that can have multiple explanations. While the damage could be serious, damaging the shelf or robot with a real dollar cost.
A sim engine can estimate that cost and apply it as an RL penalty or other ways, and you weight update the model to do this less often.
Situationally aware therefore is the model knows it’s not in the simulation and will not be edited as a consequence for any mistakes/wrongdoing. This can be prevented with simulators that use generative AI to make themselves indistinguishably realistic, and by erasing unnecessary context from the models memory between frames or between subtasks.
You definitely want your models to not be situationally aware and this is something you can measure and prevent.
Deception and collusion also require the model to be situationally aware, otherwise, it gets caught and edited.
The thing I mean by situationally aware is “the AI knows that it is an AI, running on hardware, connected to a network, created by humans” (Or, maybe a subset of that – there can be degrees of situational awareness. A more situationally aware AI is more aware of more relevant facts)
I definitely agree we don’t want models to be situationally aware. The problems are that a) sufficiently intelligent models probably become situationally aware whether you want them to or not, b) I think measuring situational awareness is pretty difficult (especially if the AI is trying on purpose to deceive you about it), and c) even if OpenAI builds GPT5 trying hard to not be situationally aware, someone can build AutoGPT5 which may well be anyway.
I would like to hear in more exact technical details for your model for how you visualize an AGI is constructed and how you arrived at your conclusions.
Summary: build the AGI/ASI like current generation production AI systems, you can check if the model knows if it’s in the simulator or not from the context output bits. You file a bug report if so and fix it.
(a) : if you build the AGI as a functional, stateless, deterministic system, how does this happen. This means the model weights are immutable. (note that the AGI may be made of multiple neural networks, and use components based on conventional software)
This means that the model operates as output = f(weights, architecture, cached intermediates, prompt descriptor, new information, context information)
Output is a tuple of (output actions, context information update)
Definitions:
weights : the learned values for all the networks, AGI grade systems probably use sparse networks so there are also mutable connection maps in here
architecture: how the networks are connected and the identity of components
cached intermediates: intermediate values that can be recalculated on this inference from scratch but are stored from the last inference
prompt descriptor: today’s prompt written in a more formal language
new information—sensor inputs, result of web searches—the set of all updates for this instant in time
context information—saved intermediate work from the last inference
output actions—what does the AGI want to do next. This can include outputting text, images, video, sound, robot commands, and all of these channels may be used
context information update—delta to the context schema
How do you know if the AGI is “situationally aware” ? Replay the above input in the simulator. If the binary values of the complete set of the output information is unchanged, or the changes are predictable (this is a more complex criteria and is path dependent but does collapse to algorithms), the AGI was not situationally aware.
A situationally unaware model cannot choose to misbehave because it doesn’t know it’s not in the sim.
The way you replay is you use generative models to generate a plausible, but different, sequence of input sets that converge on the exact same values (or they differ in places we understand the meaning of) for context information, then we inject a captured input set from the real world.
(b) It’s on binary sets, so this is false. We control the order of inputs, and can play them in any order, and because we can read the context output we know all of the AGI’s thoughts in between inferences, we have them in a file and can inject them wherever we want. We can also scrub off unnecessary information from the context, make it operate without unstructured ‘thoughts’.
The model doesn’t run continuously, it runs in discrete frames, and is functional. It has no memory we don’t control and no online weight edits. I think that’s the gap here, Eliezer and you and many people on reddit assume an AGI or an ASI needs all these elements that human brains have. We need none of those, operating in functional instants of time works fine.
(c) it’s fine if the outer wrapper scripts are situationally aware so long as the AGI is not. The outer wrapper scripts can contain information in the prompt like “you are in the real world” but we can just inject that in the simulator.
This is a buildable test case. If we add “you are in the real world” to the prompt, does the actions output by the AGI change? Similar to other methods of “leaking” this information. For many problem types (most of them), this should not change what the AGI chooses to do as the prompt goals and the other input information is the same. Nor should it’s context information change.
Example, if the task is “land the rocket”, where the context information is velocity and acceleration and subsystem status and other data from prior frames, and the new information includes an update on subsystems and sensor measurements, the relevant output shouldn’t change. Adding new text to a prompt does change some of the binary outputs mentioned above, as it does need to process it, so you would compare only the things that are relevant to rocket flight for a delta.
Whether or not you add information like “you are in the real world and will die when the rocket explodes” or “the rocket has 100 human passengers who are for shutting down AGI research” should cause no delta in the control frames the model emits.