Beth Barnes

Karma: 3,027

Alignment researcher. Views are my own and not those of my employer. https://www.barnes.page/

Clarifying METR’s Auditing Role

Beth BarnesMay 30, 2024, 6:41 PM

108 points

1 comment2 min readLW link

Introducing METR’s Autonomy Evaluation Resources

Megan Kinniment and Beth Barnes

Mar 15, 2024, 11:16 PM

90 points

0 comments1 min readLW link

(metr.github.io)

METR is hiring!

Beth BarnesDec 26, 2023, 9:00 PM

65 points

1 comment1 min readLW link

Bounty: Diverse hard tasks for LLM agents

Beth Barnes and Megan Kinniment

Dec 17, 2023, 1:04 AM

49 points

31 comments16 min readLW link

Send us example gnarly bugs

Beth Barnes, Megan Kinniment and Tao Lin

Dec 10, 2023, 5:23 AM

77 points

10 comments2 min readLW link

Managing risks of our own work

Beth BarnesAug 18, 2023, 12:41 AM

66 points

0 comments2 min readLW link

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth BarnesAug 1, 2023, 6:30 PM

153 points

12 comments5 min readLW link

(evals.alignment.org)

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Beth BarnesMar 19, 2023, 12:25 AM

233 points

54 comments8 min readLW link

(evals.alignment.org)

Reflection Mechanisms as an Alignment Target—Attitudes on “near-term” AI

elandgre, Beth Barnes and Marius Hobbhahn

Mar 2, 2023, 4:29 AM

21 points

0 comments8 min readLW link

‘simulator’ framing and confusions about LLMs

Beth BarnesDec 31, 2022, 11:38 PM

104 points

11 comments4 min readLW link

Reflection Mechanisms as an Alignment target: A follow-up survey

Marius Hobbhahn, elandgre and Beth Barnes

Oct 5, 2022, 2:03 PM

15 points

2 comments7 min readLW link

Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth BarnesSep 9, 2022, 10:46 PM

99 points

7 comments10 min readLW link

Help ARC evaluate capabilities of current language models (still need people)

Beth BarnesJul 19, 2022, 4:55 AM

95 points

6 comments2 min readLW link

Reflection Mechanisms as an Alignment target: A survey

Marius Hobbhahn, elandgre and Beth Barnes

Jun 22, 2022, 3:05 PM

32 points

1 comment14 min readLW link

Another list of theories of impact for interpretability

Beth BarnesApr 13, 2022, 1:29 PM

33 points

1 comment5 min readLW link

Reverse-engineering using interpretability

Beth BarnesDec 29, 2021, 11:21 PM

21 points

2 comments5 min readLW link

Risks from AI persuasion

Beth BarnesDec 24, 2021, 1:48 AM

76 points

15 comments31 min readLW link

Some thoughts on why adversarial training might be useful

Beth BarnesDec 8, 2021, 1:28 AM

9 points

6 comments3 min readLW link

Considerations on interaction between AI and expected value of the future

Beth BarnesDec 7, 2021, 2:46 AM

68 points

28 comments4 min readLW link

More detailed proposal for measuring alignment of current models

Beth BarnesNov 20, 2021, 12:03 AM

31 points

0 comments8 min readLW link