RSS

Kshitij Sachan

Karma: 334

Redwood Research

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

13 Dec 2023 15:51 UTC
228 points
18 comments10 min readLW link1 review

LLMs are (mostly) not helped by filler tokens

Kshitij Sachan10 Aug 2023 0:48 UTC
66 points
35 comments6 min readLW link