Lukas Finnveden comments on DeepMind: Evaluating Frontier Models for Dangerous Capabilities

Lukas Finnveden 8 Aug 2024 18:12 UTC
2 points
0
Incidentally: Were the persuasion evals done on models with honesty training or on helpfulness-only models? (Couldn’t find this in the paper, sorry if I missed it.)
What links here?
- What’s important in “AI for epistemics”? by Lukas Finnveden (EA Forum; 24 Aug 2024 1:27 UTC; 66 points)
- What’s important in “AI for epistemics”? by Lukas Finnveden (24 Aug 2024 1:27 UTC; 41 points)
- Rohin Shah 8 Aug 2024 21:26 UTC
  2 points
  0
  Parent
  I don’t know the exact details but to my knowledge we didn’t have trouble getting the model to lie (e.g. for web of lies).