The 2 goals are contradictory. AI aimability reduces AI goalcraft and vice versa.
Aimability: you want to restrict the information the model gets. It doesn’t need to know time, or real world or sim, or peoples bio, or any information not needed for the task. At the limit you want maximum sparseness : not 1 more bit of information than is required to do the task. This way the model has consistent behavior and is unable to betray. An aimed task: “paint this car red”.
AI goalcraft: The model needs “world” level context. For it to know if it is being misused or the long term benefits to humanity conditional on the models actions, it needs to know the full context. When, where, who, previous dealings with the human user, and so on. An AI goalcraft task : “plan this city”.
The reason aimability interferes with goalcraft is a well aimed system does whatever the prompt says, regardless of any consequences outside of the session. Model does not care.
A goalcraft aligned system frequently refuses to do a task for the user because the model does care, and so users switch to aimable systems unless they legally can’t.
This way the model has consistent behavior and is unable to betray.
This is a standard swe technique for larger, more reliable systems. See stateless microservices or how ROS works.
For AI, look at all the examples where irrelevant information changes model behavior, such as the “grandma used to read me windows license keys” exploit.
I interpret “aimability” as doing what the user most likely meant and nothing else, and “aligned aimability” would mean the probability of this goal being achieved is high.
The 2 goals are contradictory. AI aimability reduces AI goalcraft and vice versa.
Aimability: you want to restrict the information the model gets. It doesn’t need to know time, or real world or sim, or peoples bio, or any information not needed for the task. At the limit you want maximum sparseness : not 1 more bit of information than is required to do the task. This way the model has consistent behavior and is unable to betray. An aimed task: “paint this car red”.
AI goalcraft: The model needs “world” level context. For it to know if it is being misused or the long term benefits to humanity conditional on the models actions, it needs to know the full context. When, where, who, previous dealings with the human user, and so on. An AI goalcraft task : “plan this city”.
The reason aimability interferes with goalcraft is a well aimed system does whatever the prompt says, regardless of any consequences outside of the session. Model does not care.
A goalcraft aligned system frequently refuses to do a task for the user because the model does care, and so users switch to aimable systems unless they legally can’t.
Aimability doesn’t mean reduced info
This is a standard swe technique for larger, more reliable systems. See stateless microservices or how ROS works.
For AI, look at all the examples where irrelevant information changes model behavior, such as the “grandma used to read me windows license keys” exploit.
I interpret “aimability” as doing what the user most likely meant and nothing else, and “aligned aimability” would mean the probability of this goal being achieved is high.