Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
rgorman
Karma:
206
All
Posts
Comments
New
Top
Old
Using Prompt Evaluation to Combat Bio-Weapon Research
Stuart_Armstrong
and
rgorman
19 Feb 2025 12:39 UTC
11
points
2
comments
3
min read
LW
link
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
Stuart_Armstrong
and
rgorman
31 Jan 2025 15:36 UTC
16
points
2
comments
2
min read
LW
link
Concept extrapolation for hypothesis generation
Stuart_Armstrong
,
Patrick Leask
and
rgorman
12 Dec 2022 22:09 UTC
20
points
2
comments
3
min read
LW
link
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
and
rgorman
6 Dec 2022 19:54 UTC
170
points
85
comments
9
min read
LW
link
Back to top