RSS

METR (org)

TagLast edit: Jul 1, 2024, 6:47 PM by Ruby

Formerly ARC Evals

Re­view of METR’s pub­lic eval­u­a­tion protocol

Jun 30, 2024, 10:03 PM
10 points
0 comments5 min readLW link

ARC Evals new re­port: Eval­u­at­ing Lan­guage-Model Agents on Real­is­tic Au­tonomous Tasks

Beth BarnesAug 1, 2023, 6:30 PM
153 points
12 comments5 min readLW link
(evals.alignment.org)

METR is hiring!

Beth BarnesDec 26, 2023, 9:00 PM
65 points
1 comment1 min readLW link

Re­ac­tions to METR task length pa­per are insane

Cole WyethApr 10, 2025, 5:13 PM
47 points
39 comments4 min readLW link

Im­proved vi­su­al­iza­tions of METR Time Hori­zons pa­per.

LDJMar 19, 2025, 11:36 PM
20 points
4 comments2 min readLW link

[Question] How far along Metr’s law can AI start au­tomat­ing or helping with al­ign­ment re­search?

Christopher KingMar 20, 2025, 3:58 PM
20 points
21 comments1 min readLW link

METR: Mea­sur­ing AI Abil­ity to Com­plete Long Tasks

Zach Stein-PerlmanMar 19, 2025, 4:00 PM
234 points
94 comments5 min readLW link
(metr.org)

Clar­ify­ing METR’s Au­dit­ing Role

Beth BarnesMay 30, 2024, 6:41 PM
108 points
1 comment2 min readLW link

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

Mar 15, 2024, 11:16 PM
90 points
0 comments1 min readLW link
(metr.github.io)

METR: AI mod­els can be dan­ger­ous be­fore pub­lic deployment

UnofficialLinkpostBotFeb 26, 2025, 8:19 PM
16 points
0 comments3 min readLW link
(metr.org)

ARC Evals: Re­spon­si­ble Scal­ing Policies

Zach Stein-PerlmanSep 28, 2023, 4:30 AM
40 points
10 comments2 min readLW link1 review
(evals.alignment.org)

METR is hiring ML Re­search Eng­ineers and Scientists

XodarapJun 5, 2024, 9:27 PM
5 points
0 comments1 min readLW link
(metr.org)
No comments.