RSS

Andy Arditi

Karma: 627

https://​​andyrdt.com

Do mod­els say what they learn?

Mar 22, 2025, 3:19 PM
113 points
12 comments13 min readLW link

Find­ing Fea­tures Causally Up­stream of Refusal

Jan 14, 2025, 2:30 AM
48 points
5 comments12 min readLW link

AI as sys­tems, not just models

Andy ArditiDec 21, 2024, 11:19 PM
28 points
0 comments7 min readLW link
(andyrdt.com)

Un­learn­ing via RMU is mostly shallow

Jul 23, 2024, 4:07 PM
52 points
3 comments6 min readLW link