RSS

rusheb

Karma: 386

Abla­tions for “Fron­tier Models are Ca­pable of In-con­text Schem­ing”

Dec 17, 2024, 11:58 PM
115 points
1 comment2 min readLW link

Fron­tier Models are Ca­pable of In-con­text Scheming

Dec 5, 2024, 10:11 PM
203 points
24 comments7 min readLW link

Apollo Re­search 1-year update

May 29, 2024, 5:44 PM
93 points
0 comments7 min readLW link

A starter guide for evals

Jan 8, 2024, 6:24 PM
53 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

Nov 7, 2023, 5:59 PM
38 points
2 comments2 min readLW link
(arxiv.org)

Un­der­stand­ing mesa-op­ti­miza­tion us­ing toy models

May 7, 2023, 5:00 PM
43 points
2 comments10 min readLW link