Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Magdalena Wache
Karma:
537
All
Posts
Comments
New
Top
Old
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq
,
jake_mendel
,
Dan Braun
,
StefanHex
,
Nicholas Goldowsky-Dill
,
Kaarel
,
Avery
,
Joern Stoehler
,
debrevitatevitae
,
Magdalena Wache
and
Marius Hobbhahn
20 May 2024 17:53 UTC
105
points
4
comments
3
min read
LW
link
Interpretability Externalities Case Study—Hungry Hungry Hippos
Magdalena Wache
20 Sep 2023 14:42 UTC
64
points
22
comments
2
min read
LW
link
Technical AI Safety Research Landscape [Slides]
Magdalena Wache
18 Sep 2023 13:56 UTC
41
points
0
comments
4
min read
LW
link
AI Safety Europe Retreat 2023 Retrospective
Magdalena Wache
14 Apr 2023 9:05 UTC
43
points
0
comments
2
min read
LW
link
Finite Factored Sets in Pictures
Magdalena Wache
11 Dec 2022 18:49 UTC
174
points
35
comments
12
min read
LW
link
Back to top