RSS

John Hughes

Karma: 437

Former MATS scholar working on scalable oversight and adversarial robustness.

Align­ment Fak­ing Re­vis­ited: Im­proved Clas­sifiers and Open Source Extensions

Apr 8, 2025, 5:32 PM
145 points
20 comments12 min readLW link

Tips and Code for Em­piri­cal Re­search Workflows

Jan 20, 2025, 10:31 PM
94 points
14 comments20 min readLW link

Tips On Em­piri­cal Re­search Slides

Jan 8, 2025, 5:06 AM
90 points
4 comments6 min readLW link

Best-of-N Jailbreaking

Dec 14, 2024, 4:58 AM
78 points
5 comments2 min readLW link
(arxiv.org)

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

Feb 7, 2024, 9:28 PM
89 points
14 comments9 min readLW link
(arxiv.org)