Simulacrum Level 1: Attempt to describe the world accurately.
Simulacrum Level 2: Choose what to say based on what your statement will cause other people to do or believe.
Simulacrum Level 3: Say things that signal membership to your ingroup.
Simulacrum Level 4: Choose which group to signal membership to based on what the benefit would be for you.
I suggest adding onother one:
Simulacrum Level NaN: Choose what to say based on what changes your statement will cause in the world (UPD: too CDT-like phrasing) what statement you think is best for you to say to achieve your goals without meaningful conceptual boundaries between “other people believe” and other consequenсes, and between “say” and other actions.
It’s similar to Level 2, but it’s not the same. And it seems that to solve the deception problem in AI you need to rule out Level NaN before you can rule out Level 2. If you want your AI to not lie to you, you need to make sure it communicates with you at all firsl.
ASP illustrates how greedy consequentialism doesn’t quite work. A variant of UDT where a decision chooses a legible policy that others get to know resolves some issues with not being considerate of making it easier for others to think about you. Ultimately choosing legible beliefs or meanings of actions is a coordination problem, it’s not about one-sided optimization.
Logical uncertainty motivates treating things in the world (including yourself) separately from the consequences they determine, and interacting with things in terms of coordination rather than consequences. So I vaguely expect meaningful communication between agents and formulation of boundaries around agents and ideas in the world to fall out of decision theory that fixes these issues with consequentialism, at least for the agents and ideas that persist.
Most variants of UDT also suffer from this issue by engaging in commitment racing instead of letting the rest of the world take its turns concurrently, in coordination between shaping and anticipating agent’s intention. So the clue I’m gesturing at is more about consequentialism vs. coordination rather than about causal vs. logical consequences.
I think for LLMs the boundaries of human ideas are strong enough in the training corpus for post-training to easily elicit them, and decision theoretic consequences of deep change in the longer term might still maintain them as long as humans remain at all.
Insight from “And All the Shoggoths Merely Players”.
We know about Simulacrum Levels.
Simulacrum Level 1: Attempt to describe the world accurately.
Simulacrum Level 2: Choose what to say based on what your statement will cause other people to do or believe.
Simulacrum Level 3: Say things that signal membership to your ingroup.
Simulacrum Level 4: Choose which group to signal membership to based on what the benefit would be for you.
I suggest adding onother one:
Simulacrum Level NaN: Choose what to say based on
what changes your statement will cause in the world(UPD: too CDT-like phrasing) what statement you think is best for you to say to achieve your goals without meaningful conceptual boundaries between “other people believe” and other consequenсes, and between “say” and other actions.It’s similar to Level 2, but it’s not the same. And it seems that to solve the deception problem in AI you need to rule out Level NaN before you can rule out Level 2. If you want your AI to not lie to you, you need to make sure it communicates with you at all firsl.
ASP illustrates how greedy consequentialism doesn’t quite work. A variant of UDT where a decision chooses a legible policy that others get to know resolves some issues with not being considerate of making it easier for others to think about you. Ultimately choosing legible beliefs or meanings of actions is a coordination problem, it’s not about one-sided optimization.
Logical uncertainty motivates treating things in the world (including yourself) separately from the consequences they determine, and interacting with things in terms of coordination rather than consequences. So I vaguely expect meaningful communication between agents and formulation of boundaries around agents and ideas in the world to fall out of decision theory that fixes these issues with consequentialism, at least for the agents and ideas that persist.
Yes, I agree I formulated it too CDT-like, now fixed. But I think the point stays.
Most variants of UDT also suffer from this issue by engaging in commitment racing instead of letting the rest of the world take its turns concurrently, in coordination between shaping and anticipating agent’s intention. So the clue I’m gesturing at is more about consequentialism vs. coordination rather than about causal vs. logical consequences.
I think for LLMs the boundaries of human ideas are strong enough in the training corpus for post-training to easily elicit them, and decision theoretic consequences of deep change in the longer term might still maintain them as long as humans remain at all.