Osaid Nasir

Karma: 0

Sr Applied Research Engineer at LinkedIn

AI Alignment, Red Teaming, Safety

Osaid Nasir 5 Sep 2024 6:46 UTC
1 point
0
in reply to: Rohin Shah’s comment on: AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
since accuracies aren’t near-100% we know there are some cases the model hasn’t memorized, so the mechanism you suggest doesn’t apply to those inputs
That makes sense.
I suspect the prompts are a bigger deal
Do you suppose a suitable proxy for prompt quality can be replicating these experiments with LLM debaters/judges of different sizes? Let’s say P is the optimal prompt and Q is a suboptimal one, then LLM performance with prompt Q ⇐ LLM performance with prompt P ⇐ bigger LLM performance with prompt Q.

Osaid Nasir 2 Sep 2024 20:47 UTC
1 point
0
in reply to: Rohin Shah’s comment on: AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Oh that’s interesting. Wouldn’t that slightly bias the results? For eg. the paper claims no advantage of debate over QA without article. Intuitively if the weak LLM isn’t pretrained on QA without article then debate should work better than consultancy. On the other hand, if it is, then intuitively there should be no difference between Debate and Consultancy which is what the team observes. Wdyt?

Osaid Nasir 28 Aug 2024 10:59 UTC
1 point
0
in reply to: Rohin Shah’s comment on: AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Ah that makes sense, thank you.
Did the team also ensure that there wasn’t any data leakage between the tasks being evaluated and the pretraining data? For context, I’m thinking of replicating the results with Llama so wondering about the same.

Osaid Nasir 27 Aug 2024 10:31 UTC
1 point
0
in reply to: Rohin Shah’s comment on: AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
My apologies I didn’t frame my question correctly.
Our current work is looking into training our LLM judges to be better proxies of human judges
My understanding from this statement is that the team plans to finetune Weak LLMs on human judges and then use them as a judge for Strong LLM Debates. This makes sense right now, when human judges are able to assess Strong LLM Debates fairly robustly.
What happens when we want to use a Weak LLM as a judge but there is no accurate or good enough human judge? At that point we won’t be able to finetune the Weak LLM because there is no good human judge. Do we assume that at that stage the Weak LLM itself will be pretty robust?

Osaid Nasir 26 Aug 2024 9:31 UTC
1 point
0
on: AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Our current work is looking into training our LLM judges to be better proxies of human judges
How does this scale to superintelligent AI capabilities? Wouldn’t Debate be severely restricted by a lack of accurate human judges at that point? Or is the idea akin to Weak to Strong generalisation wherein the human judge can act like a weak teacher judge at that point.