Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
HarrietW
Karma:
39
All
Posts
Comments
New
Top
Old
Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut
,
Lewis Hammond
and
HarrietW
3 Aug 2024 10:16 UTC
7
points
0
comments
14
min read
LW
link
(www.oliversourbut.net)
Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter
,
Francis Rhys Ward
,
HarrietW
,
LAThomson
,
Ollie J
,
Patrik Bartak
and
Sam F. Brown
8 Nov 2023 11:37 UTC
49
points
0
comments
18
min read
LW
link
Back to top