Lech Mazur comments on Sabotage Evaluations for Frontier Models

Lech Mazur 22 Oct 2024 17:29 UTC
1 point
0
Somewhat related: I just published the LLM Deceptiveness and Gullibility Benchmark. This benchmark evaluates both how well models can generate convincing disinformation and their resilience against deceptive arguments. The analysis covers 19,000 questions and arguments derived from provided articles.