Artyom Karpov comments on Simple Steganographic Computation Eval—gpt-4o and gemini-exp-1206 can’t solve it yet

Artyom Karpov 21 Dec 2024 9:43 UTC
4 points
0
I like this task, you used in this project to verify steganography, that requires model to keep the state to calculate the final result and that it is highly unlikely that model might have learned to do this before seeing this task. I wonder if models can actually do this task when we allow them to use CoT for that. Also, I think models might actually solve this task by using their own encoding scheme if they know it well, but we need to find that scheme. So perhaps if we ask a model first to think about an encoding scheme and then ask them to use it for the task, they might succeed.
- Filip Sondej 21 Dec 2024 16:05 UTC
  4 points
  0
  Parent
  I wonder if models can actually do this task when we allow them to use CoT for that.
  Yes, claude-3.5-sonnet was able to use the figure this out with additional CoT.
  Also, I think models might actually solve this task by using their own encoding scheme if they know it well
  Yeah, could be that the 3 schemes I tested were just unnatural to them. Although I would guess they don’t have some default scheme of their own, because in pre-training they aren’t able to output information freely, and in fine-tuning I guess they don’t have that much pressure to learn it.
  So perhaps if we ask a model first to think about an encoding scheme and then ask them to use it for the task, they might succeed
  Yeah, that would be interesting. I asked claude. He generated 3 schemes which were pretty bad, and 2 which look promising:
```
State 0: End sentence with period (.)
State 1: End sentence with exclamation (!)
State 2: End sentence with question mark (?)
```
```
State 0: Present tense
State 1: Past tense
State 2: Future tense
```
  The punctuation one seems especially promising. But they don’t feel much different from the 3 I tested. I don’t expect that they are somehow “special” for the model.
  
  I rerun the experiment with punctuation scheme. Looks overall the same.