Clément Dumas

Karma: 73

I’m a CS master’s student at ENS Paris-Saclay. I want to pursue a career in AI safety research.

MATS 7.0 Scholar in Neel Nanda’s stream

Clément Dumas Jan 16, 2024, 10:34 AM
1 point
0
in reply to: Hoagy’s comment on: What’s up with LLMs representing XORs of arbitrary features?
You can get ~75% just by computing the or. But we found that only at the last layer and step16000 of Pythia-70m training it achieves better than 75%, see this video

Clément Dumas Jan 14, 2024, 12:58 AM
1 point
0
in reply to: Hoagy’s comment on: What’s up with LLMs representing XORs of arbitrary features?
Would you expect that we can extract xors from small models like pythia-70m under your hypothesis?

Clément Dumas Dec 21, 2023, 12:52 PM
1 point
0
in reply to: RogerDearnaley’s comment on: Discussion: Challenges with Unsupervised LLM Knowledge Discovery
I disagree; it could be beneficial for a base model to identify when a character is making false claims, enabling the prediction of such claims in the future.

Clément Dumas Oct 28, 2023, 6:07 PM
1 point
0
in reply to: Nathan Helm-Burger’s comment on: Aspiration-based Q-Learning
Hi Nathan, I’m not sure if I understand your critique correctly. The algorithm we describe does not try to “maximize the expected likelihood of harvesting X apples”. It tries to find a policy that, given its current knowledge of the world, will achieve an expected return of X apples. That is, it does not care about the probability of getting exactly X apples, but rather the average number of apples it will get over many trials. Does that make sense?

Aspiration-based Q-Learning

Oct 27, 2023, 2:42 PM

38 points