RSS

aog

Karma: 1,572

Re­search Pri­ori­ties for Hard­ware-En­abled Mechanisms (HEMs)

aogApr 30, 2025, 5:43 PM
16 points
2 comments15 min readLW link
(www.longview.org)

aog’s Shortform

aogApr 19, 2025, 10:07 PM
6 points
21 commentsLW link

Bench­mark­ing LLM Agents on Kag­gle Competitions

aogMar 22, 2024, 1:09 PM
15 points
4 comments5 min readLW link

Ad­ver­sar­ial Ro­bust­ness Could Help Prevent Catas­trophic Misuse

aogDec 11, 2023, 7:12 PM
30 points
18 comments9 min readLW link

Un­su­per­vised Meth­ods for Con­cept Dis­cov­ery in AlphaZero

aogOct 26, 2023, 7:05 PM
9 points
0 comments1 min readLW link
(arxiv.org)

MLSN: #10 Ad­ver­sar­ial At­tacks Against Lan­guage and Vi­sion Models, Im­prov­ing LLM Hon­esty, and Trac­ing the In­fluence of LLM Train­ing Data

Sep 13, 2023, 6:03 PM
15 points
1 comment5 min readLW link
(newsletter.mlsafety.org)

Hood­winked: Eval­u­at­ing De­cep­tion Ca­pa­bil­ities in Large Lan­guage Models

aogAug 25, 2023, 7:39 PM
25 points
3 comments3 min readLW link

Learn­ing Trans­former Pro­grams [Linkpost]

aogJun 8, 2023, 12:16 AM
7 points
0 comments1 min readLW link
(arxiv.org)

Full Au­toma­tion is Un­likely and Un­nec­es­sary for Ex­plo­sive Growth

aogMay 31, 2023, 9:55 PM
28 points
3 comments5 min readLW link

Model-driven feed­back could am­plify al­ign­ment failures

aogJan 30, 2023, 12:00 AM
21 points
1 comment2 min readLW link

Anal­y­sis: US re­stricts GPU sales to China

aogOct 7, 2022, 6:38 PM
102 points
58 comments5 min readLW link

Git Re-Basin: Merg­ing Models mod­ulo Per­mu­ta­tion Sym­me­tries [Linkpost]

aogSep 14, 2022, 8:55 AM
21 points
0 comments2 min readLW link
(arxiv.org)

Ar­gu­ment against 20% GDP growth from AI within 10 years [Linkpost]

aogSep 12, 2022, 4:08 AM
59 points
20 comments5 min readLW link
(twitter.com)

ML Model At­tri­bu­tion Challenge [Linkpost]

aogAug 30, 2022, 7:34 PM
11 points
0 comments1 min readLW link
(mlmac.io)

Emer­gent Abil­ities of Large Lan­guage Models [Linkpost]

aogAug 10, 2022, 6:02 PM
25 points
2 comments1 min readLW link
(arxiv.org)

Key Papers in Lan­guage Model Safety

aogJun 20, 2022, 3:00 PM
40 points
1 comment22 min readLW link

Yud­kowsky Con­tra Chris­ti­ano on AI Take­off Speeds [Linkpost]

aogApr 5, 2022, 2:09 AM
18 points
0 comments11 min readLW link

[Link] Did AlphaS­tar just click faster?

aogJan 28, 2019, 8:23 PM
4 points
14 comments1 min readLW link