Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
mrinank_sharma
Karma:
108
All
Posts
Comments
New
Top
Old
Towards Understanding Sycophancy in Language Models
Ethan Perez
,
mrinank_sharma
,
Meg
and
Tomek Korbak
24 Oct 2023 0:30 UTC
66
points
0
comments
2
min read
LW
link
(arxiv.org)
Paper: Understanding and Controlling a Maze-Solving Policy Network
TurnTrout
,
Ulisse Mini
,
peligrietzer
,
mrinank_sharma
,
Austin Meek
,
Monte M
and
lisathiergart
13 Oct 2023 1:38 UTC
70
points
0
comments
1
min read
LW
link
(arxiv.org)
Back to top