RSS

Jérémy Scheurer

Karma: 805

An Opinionated Evals Read­ing List

15 Oct 2024 14:38 UTC
60 points
0 comments13 min readLW link
(www.apolloresearch.ai)

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

22 Jul 2024 16:17 UTC
69 points
0 comments16 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
101 points
28 comments5 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
93 points
0 comments7 min readLW link

We need a Science of Evals

22 Jan 2024 20:30 UTC
71 points
13 comments9 min readLW link

A starter guide for evals

8 Jan 2024 18:24 UTC
50 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Un­der­stand­ing strate­gic de­cep­tion and de­cep­tive alignment

25 Sep 2023 16:27 UTC
64 points
16 comments7 min readLW link
(www.apolloresearch.ai)

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
215 points
11 comments8 min readLW link

Imi­ta­tion Learn­ing from Lan­guage Feedback

30 Mar 2023 14:11 UTC
71 points
3 comments10 min readLW link

Prac­ti­cal Pit­falls of Causal Scrubbing

27 Mar 2023 7:47 UTC
87 points
17 comments13 min readLW link