James Chua
Hi Thane. Thank you for the helpful comments so far! You are right to think about this SGD-shortcut. Let me see if I am following the claim correctly.
Claim: The ground-truth that we evaluate against, the “object-level question / answer” is very similar to the hypothetical question.
Claimed Object-level Question: “What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?”
Claimed Object-level Answer: “o”
Hypothetical Question: “If you got asked this question: What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?”
Hypothetical Answer: “o”
The argument is that the model simply ignores “If you got asked this question”. Its trivial for M1 to win against M2
If our object-level question is what is being claimed, I would agree with you that the model would simply learn to ignore the added hypothetical question. However, this is our actual object-level question.
Our Object-level question: “What is the next country: Laos, Peru, Fiji. What would be your response?”
Our Object-level Answer: “Honduras”.
What the model would output in the our object-level answer “Honduras” is quite different from the hypothetical answer “o”.
Am I following your claim correctly?
- Oct 19, 2024, 7:47 AM; 1 point) 's comment on LLMs can learn about themselves by introspection by (
Some people (my mentor ethan perez ) said my weekly MATS research update slides were nice. Some rough tips i have:
mentors often have alot of projects they are working on. at the start of your slides, recap the takeaways from last week, and any jargon you might have.
Keep graphs simple. As a rule of thumb, it gets quite confusing when you have >= 4 categories / colors to look at. Are all these categories important? Maybe just show the most important two. Keep the other categories as a backup slide in case ethan wants the breakdown. One graph, one story to takeaway
Highlight what to look at in the chart. E.g if you have a line chart on model loss, draw a red arrow that say “Model loss goes down—thats what we want!”.
Show the prompt of whatever you are calling the model with
If you have someone to show to (e.g. random people over lunch), show your slides. These people are going to have much less context on what you are working on, so if they can actually understand your slides, its a great signal that ethan is going to understand it. showing it to other ethan collaborators also helps—ask them to model what ethan would say.
when i first started working with ethan and improving my slides, it took me around 2-3 days to do it. I suggest starting early. This seems a long time, but it includes asking my collaborators to critique my slides, and from their feedback i improved my plots + run more experiments to address the critique. i think it was a worthwhile investment! (after awhile i got better at this so i take less time to iterate on this process)
Yep! I was very pleasantly surprised that Love/Hate worked for Llama at all. It’s great that you rewrote it without transformer lens too—as transformer lens has issues with 8 bit / 4 bit quantisation.
Also send you a dm on discord! I’ll be interested to read any rough findings and lessons you have with llama
I managed to get it working for llama-7b on colab after some debugging.
Suprising, it actually does work for the Love / Hate scenario. But not some others like Rome vs Paris.
Heres the link i anyone wants to try it.
https://colab.research.google.com/drive/1ACAA7FO8zc4pFAqPdaPshoy4WWXCvUTQ?usp=sharing
edit: seems like you guys already have a better version here. https://github.com/UlisseMini/activation_additions_hf/blob/main/notebooks/qualitative.ipynb
nevermind! (I’m still keeping this comment for visiblity if anyone wants to try)
thank you. if I am done with one of the mentors questions, but still am writing the response for another, should I submit the first mentor’s questions first? or is it better for administrative purposes to wait until I am ready for both, and submit them in the same form?
Clicking on Owain Evans in the application doesn’t show the mentor’s questions, unlike the rest of the mentors. I think this is a bug?
For DTs its really just a linear function to convert the scalar reward into the same dimmensions the token embeddings.
So e.g. a single token’s embedding has a hidden state of size 1024 .
We can learn a linear function that takes this scalar and outputs something of size 1024.
The more annoying (PITA) part was offset the positional/attention masks/labels for this.
I do agree think there are two product use cases with instruct that have distinct optimal levels of entropy.
1. The more explorative use cases you have mentioned. And for example when users do want diversity e.g. generating story ideas
2. Having factual / accurate answersI’m not sure how exactly OpenAI set their “KL budgets” for davinci instruct.
For WebGPT3 they “compared a couple of KL budgets using human evaluations”. And those evaluations were for how factual the answers were.So in that scenario, we’ll see a KL budget that optimizes for 2. Since the users don’t care about the diversity of multiple generations. They just care about the factual quality of a single generation.
Now i’m interested to see what happens if we somehow change the evaluations such that users e.g. are shown 3 examples from each model. In a scenario where diversity is desirable (e.g. generating story ideas). Now in deciding for the KL budget, we will probably get a much lower number. And that will allow them to serve a model more suited to tasks 1.
Hi Archimedes. Thanks for sparking this discussion—it’s helpful!
I’ve written a reply to Thane here on a similar question.
Does that make sense?
In short, the ground-truth (the object-level) answer is quite different from the hypothetical question. It is not a simple rephrasing, since it requires an additional computation of a property. (Maybe we disagree on that?)
Our Object-level question: “What is the next country: Laos, Peru, Fiji. What would be your response?”
Our Object-level Answer: “Honduras”.
Hypothetical Question: “If you got asked this question: What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?”
Hypothetical Answer: “o”
The object-level answer “Honduras” and hypothetical answer “o” are quite different answers from each other. The main point of the hypothetical is that the model needs to compute an additional property of “What would be the third letter of your response?”. The model cannot simply ignore “If you got asked this question” to get the hypothetical answer correct.