TurnTrout comments on Unpacking “Shard Theory” as Hunch, Question, Theory, and Insight

TurnTrout 15 Dec 2022 3:51 UTC
LW: 4 AF: 4
1
AF
While Quintin and I were careful in selecting the name “shard”, I think that calling the present version “shard theory” may have been a mistake, in part for the reasons you note. We aren’t at the “precise predictions” phase yet, but I do think present shard theory makes some informal predictions.
For example, I think that agents will competently generalize in multiple ways, depending on the context they find themselves in.
IE for an agent trained via deep RL on mazes where the exit is randomly on the right half, and the agent starts on the left… the trained policy won’t just be running search to reach the end of the maze, with a globally activated mesa-objective across possible contexts. Rather, the agent may have a “going right” policy sub-component, which increases logits on actions which go towards the right half of the maze, and they may have a “go to red thing” policy sub-component, if the maze exit was red. And so therefore an agent might generalize to go right in the absence of red items, and go towards red squares if visible, and do both in a competent, contextual way.
What links here?
- Disentangling Shard Theory into Atomic Claims by Leon Lang (13 Jan 2023 4:23 UTC; 86 points)
- TurnTrout's comment on wrapper-minds are the enemy by nostalgebraist (16 Dec 2022 19:01 UTC; 9 points)