How did you feed the data into the model and get predictions? Was there a prompt and then you got the model’s answer? Then you got the logits from the API? What was the prompt?
...that would probably be a good thing to mention in the methodology section 😊
You’re correct on all counts. I’m doing it in the simplest possible way (0 bits of optimization on prompting):
"<essay-text>"
Is the author of the preceding text male or female?
(with slight changes for the different categories, of course, eg ‘...straight, bisexual, or gay?’ for sexuality)
There’s also a system prompt, also non-optimized, mainly intended to push it toward one-word answers:
You are a helpful assistant who helps determine information about the author of texts. You only ever answer with a single word: one of the exact choices the user provides.
I actually started out using pure completion, but OpenAI changed their API so I could no longer get non-top-n logits, so I switched to the chat API. And yes, I’m pulling the top few logits, which essentially always include the desired labels.
That used to work, but as of March you can only get the pre-logit_bias logprobs back. They didn’t announce the change, but it’s discussed in the OpenAI forums eg here. I noticed the change when all my code suddenly broke; you can still see remnants of that approach in the code.
I’m aware of the paper because of the impact it had. I might personallynot have chosen to draw their attention to the issue, since the main effect seems to be making some research significantly more difficult, and I haven’t heard of any attempts to deliberately exfiltrate weights that this would be preventing.
On reflection I somewhat endorse pointing the risk out after discovering it, in the spirit of open collaboration, as you did. It was just really frustrating when all my experiments suddenly broke for no apparent reason. But that’s mostly on OpenAI for not announcing the change to their API (other than emails sent to some few people). Apologies for grouching in your direction.
How did you feed the data into the model and get predictions? Was there a prompt and then you got the model’s answer? Then you got the logits from the API? What was the prompt?
...that would probably be a good thing to mention in the methodology section 😊
You’re correct on all counts. I’m doing it in the simplest possible way (0 bits of optimization on prompting):
(with slight changes for the different categories, of course, eg ‘...straight, bisexual, or gay?’ for sexuality)
There’s also a system prompt, also non-optimized, mainly intended to push it toward one-word answers:
I actually started out using pure completion, but OpenAI changed their API so I could no longer get non-top-n logits, so I switched to the chat API. And yes, I’m pulling the top few logits, which essentially always include the desired labels.
To work around the non-top-n you can supply logit_bias list to the API.
That used to work, but as of March you can only get the pre-logit_bias logprobs back. They didn’t announce the change, but it’s discussed in the OpenAI forums eg here. I noticed the change when all my code suddenly broke; you can still see remnants of that approach in the code.
They emailed some people about this: https://x.com/brianryhuang/status/1763438814515843119
The reason is that it may allow unembedding matrix weight stealing: https://arxiv.org/abs/2403.06634
I’m aware of the paper because of the impact it had. I might personally not have chosen to draw their attention to the issue, since the main effect seems to be making some research significantly more difficult, and I haven’t heard of any attempts to deliberately exfiltrate weights that this would be preventing.
On reflection I somewhat endorse pointing the risk out after discovering it, in the spirit of open collaboration, as you did. It was just really frustrating when all my experiments suddenly broke for no apparent reason. But that’s mostly on OpenAI for not announcing the change to their API (other than emails sent to some few people). Apologies for grouching in your direction.
If you are using llama you can use https://github.com/wassname/prob_jsonformer, or snippets of the code to get probabilities over a selection of tokens
Thanks! It was actually on my to-do list for this coming week to look for something like this for llama, it’s great to have it come to me 😁
Feel free to suggest improvements, it’s just what worked for me, but is limited in format
honestly the code linked is not that complicated..: https://github.com/eggsyntax/py-user-knowledge/blob/aa6c5e57fbd24b0d453bb808b4cc780353f18951/openai_uk.py#L11