rgorman

Karma: 206

Using Prompt Evaluation to Combat Bio-Weapon Research

Stuart_Armstrong and rgorman

19 Feb 2025 12:39 UTC

11 points

2 comments3 min readLW link

Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

Stuart_Armstrong and rgorman

31 Jan 2025 15:36 UTC

16 points

2 comments2 min readLW link

Concept extrapolation for hypothesis generation

Stuart_Armstrong, Patrick Leask and rgorman

12 Dec 2022 22:09 UTC

20 points

2 comments3 min readLW link

Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong and rgorman

6 Dec 2022 19:54 UTC

170 points

85 comments9 min readLW link