I have to say, it does seem a bit implausible to me that even GPT-3.5-turbo is unable to do so when GPT-3.5 is so powerful and your steganography instance is such a simple one.
The “poor man’s RL” scheme I used here is quite weak (and perhaps poorly tuned). I suspect this is the primary issue, though limitations on the OpenAI finetuning API as you described could also be an issue.
The “poor man’s RL” scheme I used here is quite weak (and perhaps poorly tuned). I suspect this is the primary issue, though limitations on the OpenAI finetuning API as you described could also be an issue.