First, demonstrate each subcomponent above in isolation. E.g., if we’re trying to demonstrate that treacherous turns are possible, but models lack some relevant aspect of situational awareness, then include the relevant information about the model’s situation in the prompt.
′ petertodd’ (the glitch token) is a case that threacherous turns are possible and was out in the wild until OpenAI patched it in Feb. 2023.
′ petertodd’ (the glitch token) is a case that threacherous turns are possible and was out in the wild until OpenAI patched it in Feb. 2023.