I like this task, you used in this project to verify steganography, that requires model to keep the state to calculate the final result and that it is highly unlikely that model might have learned to do this before seeing this task. I wonder if models can actually do this task when we allow them to use CoT for that. Also, I think models might actually solve this task by using their own encoding scheme if they know it well, but we need to find that scheme. So perhaps if we ask a model first to think about an encoding scheme and then ask them to use it for the task, they might succeed.
I wonder if models can actually do this task when we allow them to use CoT for that.
Yes, claude-3.5-sonnet was able to use the figure this out with additional CoT.
Also, I think models might actually solve this task by using their own encoding scheme if they know it well
Yeah, could be that the 3 schemes I tested were just unnatural to them. Although I would guess they don’t have some default scheme of their own, because in pre-training they aren’t able to output information freely, and in fine-tuning I guess they don’t have that much pressure to learn it.
So perhaps if we ask a model first to think about an encoding scheme and then ask them to use it for the task, they might succeed
Yeah, that would be interesting. I asked claude. He generated 3 schemes which were pretty bad, and 2 which look promising:
State 0: End sentence with period (.)
State 1: End sentence with exclamation (!)
State 2: End sentence with question mark (?)
State 0: Present tense
State 1: Past tense
State 2: Future tense
The punctuation one seems especially promising. But they don’t feel much different from the 3 I tested. I don’t expect that they are somehow “special” for the model.
I rerun the experiment with punctuation scheme. Looks overall the same.
I like this task, you used in this project to verify steganography, that requires model to keep the state to calculate the final result and that it is highly unlikely that model might have learned to do this before seeing this task. I wonder if models can actually do this task when we allow them to use CoT for that. Also, I think models might actually solve this task by using their own encoding scheme if they know it well, but we need to find that scheme. So perhaps if we ask a model first to think about an encoding scheme and then ask them to use it for the task, they might succeed.
Yes, claude-3.5-sonnet was able to use the figure this out with additional CoT.
Yeah, could be that the 3 schemes I tested were just unnatural to them. Although I would guess they don’t have some default scheme of their own, because in pre-training they aren’t able to output information freely, and in fine-tuning I guess they don’t have that much pressure to learn it.
Yeah, that would be interesting. I asked claude. He generated 3 schemes which were pretty bad, and 2 which look promising:
The punctuation one seems especially promising. But they don’t feel much different from the 3 I tested. I don’t expect that they are somehow “special” for the model.
I rerun the experiment with punctuation scheme. Looks overall the same.