Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Singularian2501
Karma:
9
I like reading Machine Learning Paper.
All
Posts
Comments
New
Top
Old
Paper: Identifying the Risks of LM Agents with an LM-Emulated Sandbox—University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases!
Singularian2501
9 Oct 2023 0:00 UTC
6
points
0
comments
1
min read
LW
link
RAIN: Your Language Models Can Align Themselves without Finetuning—Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
Singularian2501
24 Sep 2023 16:48 UTC
5
points
0
comments
1
min read
LW
link
Back to top