RSS

Nora Belrose

Karma: 700

Es­ti­mat­ing the Prob­a­bil­ity of Sam­pling a Trained Neu­ral Net­work at Random

Mar 1, 2025, 2:11 AM
30 points
7 comments1 min readLW link
(arxiv.org)

Mechanis­tic Ano­maly De­tec­tion Re­search Update

Aug 6, 2024, 10:33 AM
11 points
0 comments1 min readLW link
(blog.eleuther.ai)

Open Source Au­to­mated In­ter­pretabil­ity for Sparse Au­toen­coder Features

Jul 30, 2024, 9:11 PM
67 points
1 comment13 min readLW link
(blog.eleuther.ai)

De­con­struct­ing Bostrom’s Clas­sic Ar­gu­ment for AI Doom

Nora BelroseMar 11, 2024, 5:58 AM
16 points
14 comments1 min readLW link
(www.youtube.com)