RSS

AlexMeinke

Karma: 559

Abla­tions for “Fron­tier Models are Ca­pable of In-con­text Schem­ing”

Dec 17, 2024, 11:58 PM
115 points
1 comment2 min readLW link

Fron­tier Models are Ca­pable of In-con­text Scheming

Dec 5, 2024, 10:11 PM
203 points
24 comments7 min readLW link

Train­ing AI agents to solve hard prob­lems could lead to Scheming

Nov 19, 2024, 12:10 AM
61 points
12 comments28 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

Jul 8, 2024, 10:24 PM
109 points
37 comments5 min readLW link

Apollo Re­search 1-year update

May 29, 2024, 5:44 PM
93 points
0 comments7 min readLW link

A starter guide for evals

Jan 8, 2024, 6:24 PM
53 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

Dec 19, 2023, 7:14 PM
45 points
4 comments6 min readLW link
(arxiv.org)