RSS

Scal­able Oversight

TagLast edit: 18 Apr 2024 19:57 UTC by Raemon

Gra­di­ent Rout­ing: Mask­ing Gra­di­ents to Lo­cal­ize Com­pu­ta­tion in Neu­ral Networks

6 Dec 2024 22:19 UTC
150 points
11 comments11 min readLW link
(arxiv.org)

AXRP Epi­sode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

DanielFilan24 Aug 2024 22:30 UTC
21 points
0 comments74 min readLW link

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
107 points
10 comments12 min readLW link

Scal­able over­sight as a quan­ti­ta­tive rather than qual­i­ta­tive problem

Buck6 Jul 2024 17:42 UTC
85 points
11 comments3 min readLW link

In­fer­ence-Only De­bate Ex­per­i­ments Us­ing Math Problems

6 Aug 2024 17:44 UTC
31 points
0 comments2 min readLW link

Re­in­force­ment Learn­ing from In­for­ma­tion Bazaar Feed­back, and other uses of in­for­ma­tion markets

Abhimanyu Pallavi Sudhir16 Sep 2024 1:04 UTC
5 points
1 comment5 min readLW link

NYU Code De­bates Up­date/​Postmortem

David Rein24 May 2024 16:08 UTC
27 points
4 comments10 min readLW link

On scal­able over­sight with weak LLMs judg­ing strong LLMs

8 Jul 2024 8:59 UTC
49 points
18 comments7 min readLW link
(arxiv.org)

Au­to­mated mon­i­tor­ing systems

hiki_t28 Nov 2024 18:54 UTC
1 point
0 comments2 min readLW link
No comments.