abramdemski comments on Are limited-horizon agents a good heuristic for the off-switch problem?

abramdemski 21 Dec 2021 18:27 UTC
LW: 3 AF: 3
AF
I think we could get a GPT-like model to do this if we inserted other random sequences, in the same way, in the training data; it should learn a pattern like “non-word-like sequences that repeat at least twice tend to repeat a few more times” or something like that.
GPT-3 itself may or may not get the idea, since it does have some significant breadth of getting-the-idea-of-local-patterns-its-never-seen-before.
So I don’t currently see what your experiment has to do with the planning-ahead question.
I would say that the GPT training process has no “inherent” pressure toward Bellman-like behavior, but the data provides such pressure, because humans are doing something more Bellman-like when producing strings. A more obvious example would be if you trained a GPT-like system to predict the chess moves of a tree-search planning agent.