RSS

Andy Arditi

Karma: 619

https://​​andyrdt.com

Do mod­els say what they learn?

Mar 22, 2025, 3:19 PM
110 points
8 comments13 min readLW link

Find­ing Fea­tures Causally Up­stream of Refusal

Jan 14, 2025, 2:30 AM
48 points
5 comments12 min readLW link

AI as sys­tems, not just models

Andy ArditiDec 21, 2024, 11:19 PM
28 points
0 comments7 min readLW link
(andyrdt.com)

Un­learn­ing via RMU is mostly shallow

Jul 23, 2024, 4:07 PM
50 points
3 comments6 min readLW link