RSS

Kshitij Sachan

Karma: 342

Redwood Research

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

13 Dec 2023 15:51 UTC
235 points
24 comments10 min readLW link4 reviews

LLMs are (mostly) not helped by filler tokens

Kshitij Sachan10 Aug 2023 0:48 UTC
66 points
35 comments6 min readLW link