RSS

JanB

Karma: 816

Paper in Science: Manag­ing ex­treme AI risks amid rapid progress

JanB23 May 2024 8:40 UTC
50 points
2 comments1 min readLW link

I don’t find the lie de­tec­tion re­sults that sur­pris­ing (by an au­thor of the pa­per)

JanB4 Oct 2023 17:10 UTC
97 points
8 comments3 min readLW link

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

28 Sep 2023 18:53 UTC
187 points
38 comments3 min readLW link