RSS

Dan Braun

Karma: 992

Dan Braun’s Shortform

Dan Braun5 Oct 2024 12:26 UTC
5 points
18 comments1 min readLW link

A List of 45+ Mech In­terp Pro­ject Ideas from Apollo Re­search’s In­ter­pretabil­ity Team

18 Jul 2024 14:15 UTC
117 points
18 comments18 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
93 points
0 comments7 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

20 May 2024 17:53 UTC
105 points
4 comments3 min readLW link

Iden­ti­fy­ing Func­tion­ally Im­por­tant Fea­tures with End-to-End Sparse Dic­tionary Learning

17 May 2024 16:25 UTC
57 points
10 comments4 min readLW link
(arxiv.org)

Un­der­stand­ing strate­gic de­cep­tion and de­cep­tive alignment

25 Sep 2023 16:27 UTC
64 points
16 comments7 min readLW link
(www.apolloresearch.ai)