Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Scalable Oversight
Tag
Last edit:
18 Apr 2024 19:57 UTC
by
Raemon
Relevant
New
Old
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud
,
Jacob G-W
,
Evzen
,
Joseph Miller
and
TurnTrout
6 Dec 2024 22:19 UTC
150
points
11
comments
11
min read
LW
link
(arxiv.org)
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan
24 Aug 2024 22:30 UTC
21
points
0
comments
74
min read
LW
link
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
18 Apr 2024 16:17 UTC
107
points
10
comments
12
min read
LW
link
Scalable oversight as a quantitative rather than qualitative problem
Buck
6 Jul 2024 17:42 UTC
85
points
11
comments
3
min read
LW
link
Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery
,
Abhimanyu Pallavi Sudhir
and
JacksonKaunismaa
6 Aug 2024 17:44 UTC
31
points
0
comments
2
min read
LW
link
Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir
16 Sep 2024 1:04 UTC
5
points
1
comment
5
min read
LW
link
NYU Code Debates Update/Postmortem
David Rein
24 May 2024 16:08 UTC
27
points
4
comments
10
min read
LW
link
On scalable oversight with weak LLMs judging strong LLMs
zac_kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
and
Rohin Shah
8 Jul 2024 8:59 UTC
49
points
18
comments
7
min read
LW
link
(arxiv.org)
Automated monitoring systems
hiki_t
28 Nov 2024 18:54 UTC
1
point
0
comments
2
min read
LW
link
No comments.
Back to top