Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Beth Barnes
Karma:
3,021
Alignment researcher. Views are my own and not those of my employer.
https://www.barnes.page/
All
Posts
Comments
New
Top
Old
Page
1
Clarifying METR’s Auditing Role
Beth Barnes
30 May 2024 18:41 UTC
108
points
1
comment
2
min read
LW
link
Introducing METR’s Autonomy Evaluation Resources
Megan Kinniment
and
Beth Barnes
15 Mar 2024 23:16 UTC
90
points
0
comments
1
min read
LW
link
(metr.github.io)
METR is hiring!
Beth Barnes
26 Dec 2023 21:00 UTC
65
points
1
comment
1
min read
LW
link
Bounty: Diverse hard tasks for LLM agents
Beth Barnes
and
Megan Kinniment
17 Dec 2023 1:04 UTC
49
points
31
comments
16
min read
LW
link
Send us example gnarly bugs
Beth Barnes
,
Megan Kinniment
and
Tao Lin
10 Dec 2023 5:23 UTC
77
points
10
comments
2
min read
LW
link
Managing risks of our own work
Beth Barnes
18 Aug 2023 0:41 UTC
66
points
0
comments
2
min read
LW
link
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Beth Barnes
1 Aug 2023 18:30 UTC
153
points
12
comments
5
min read
LW
link
(evals.alignment.org)
More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Beth Barnes
19 Mar 2023 0:25 UTC
233
points
54
comments
8
min read
LW
link
(evals.alignment.org)
Reflection Mechanisms as an Alignment Target—Attitudes on “near-term” AI
elandgre
,
Beth Barnes
and
Marius Hobbhahn
2 Mar 2023 4:29 UTC
21
points
0
comments
8
min read
LW
link
‘simulator’ framing and confusions about LLMs
Beth Barnes
31 Dec 2022 23:38 UTC
104
points
11
comments
4
min read
LW
link
Reflection Mechanisms as an Alignment target: A follow-up survey
Marius Hobbhahn
,
elandgre
and
Beth Barnes
5 Oct 2022 14:03 UTC
15
points
2
comments
7
min read
LW
link
Evaluations project @ ARC is hiring a researcher and a webdev/engineer
Beth Barnes
9 Sep 2022 22:46 UTC
99
points
7
comments
10
min read
LW
link
Help ARC evaluate capabilities of current language models (still need people)
Beth Barnes
19 Jul 2022 4:55 UTC
95
points
6
comments
2
min read
LW
link
Reflection Mechanisms as an Alignment target: A survey
Marius Hobbhahn
,
elandgre
and
Beth Barnes
22 Jun 2022 15:05 UTC
32
points
1
comment
14
min read
LW
link
Another list of theories of impact for interpretability
Beth Barnes
13 Apr 2022 13:29 UTC
33
points
1
comment
5
min read
LW
link
Reverse-engineering using interpretability
Beth Barnes
29 Dec 2021 23:21 UTC
21
points
2
comments
5
min read
LW
link
Risks from AI persuasion
Beth Barnes
24 Dec 2021 1:48 UTC
76
points
15
comments
31
min read
LW
link
Some thoughts on why adversarial training might be useful
Beth Barnes
8 Dec 2021 1:28 UTC
9
points
6
comments
3
min read
LW
link
Considerations on interaction between AI and expected value of the future
Beth Barnes
7 Dec 2021 2:46 UTC
68
points
28
comments
4
min read
LW
link
More detailed proposal for measuring alignment of current models
Beth Barnes
20 Nov 2021 0:03 UTC
31
points
0
comments
8
min read
LW
link
Back to top
Next