RSS

John Hughes

Karma: 143

Former MATS scholar working on scalable oversight and adversarial robustness.

Best-of-N Jailbreaking

14 Dec 2024 4:58 UTC
77 points
6 comments2 min readLW link
(arxiv.org)

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

7 Feb 2024 21:28 UTC
88 points
14 comments9 min readLW link
(arxiv.org)