RSS

SoerenMind

Karma: 1,181

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

Sep 28, 2023, 6:53 PM
187 points
39 comments3 min readLW link1 review

Wikipe­dia as an in­tro­duc­tion to the al­ign­ment problem

SoerenMindMay 29, 2023, 6:43 PM
83 points
10 comments1 min readLW link
(en.wikipedia.org)

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

Jan 10, 2023, 4:06 PM
84 points
8 comments39 min readLW link
(arxiv.org)