I would definitely expect that if we could come up with a story that was sufficiently out of distribution of our world (although I think this is pretty hard by definition), it would figure out some similar mechanism to oscillate back to ours as soon as possible (although this would also be much harder with base GPT because it has less confidence of the world it’s in)
Depends on what you mean by story. Not sure what GPT would do if you gave it the output of a random Turing machine. You could also use the state of a random cell inside a cellular automaton as your distribution.
I was thinking of some kind of prompt that would lead to GPT trying to do something as “environment agent-y” as trying to end a story and start a new one—i.e., stuff from some class that has some expected behaviour on the prior and deviates from that pretty hard. There’s probably some analogue with something like the output of random Turing machines, but for that specific thing I was pointing at this seemed like a cleaner example.
Depends on what you mean by story. Not sure what GPT would do if you gave it the output of a random Turing machine. You could also use the state of a random cell inside a cellular automaton as your distribution.
I was thinking of some kind of prompt that would lead to GPT trying to do something as “environment agent-y” as trying to end a story and start a new one—i.e., stuff from some class that has some expected behaviour on the prior and deviates from that pretty hard. There’s probably some analogue with something like the output of random Turing machines, but for that specific thing I was pointing at this seemed like a cleaner example.