RSS

aog

Karma: 1,570

Re­search Pri­ori­ties for Hard­ware-En­abled Mechanisms (HEMs)

aog30 Apr 2025 17:43 UTC
16 points
2 comments15 min readLW link
(www.longview.org)

aog’s Shortform

aog19 Apr 2025 22:07 UTC
6 points
21 commentsLW link

Bench­mark­ing LLM Agents on Kag­gle Competitions

aog22 Mar 2024 13:09 UTC
15 points
4 comments5 min readLW link

Ad­ver­sar­ial Ro­bust­ness Could Help Prevent Catas­trophic Misuse

aog11 Dec 2023 19:12 UTC
30 points
18 comments9 min readLW link

Un­su­per­vised Meth­ods for Con­cept Dis­cov­ery in AlphaZero

aog26 Oct 2023 19:05 UTC
9 points
0 comments1 min readLW link
(arxiv.org)

MLSN: #10 Ad­ver­sar­ial At­tacks Against Lan­guage and Vi­sion Models, Im­prov­ing LLM Hon­esty, and Trac­ing the In­fluence of LLM Train­ing Data

13 Sep 2023 18:03 UTC
15 points
1 comment5 min readLW link
(newsletter.mlsafety.org)

Hood­winked: Eval­u­at­ing De­cep­tion Ca­pa­bil­ities in Large Lan­guage Models

aog25 Aug 2023 19:39 UTC
25 points
3 comments3 min readLW link

Learn­ing Trans­former Pro­grams [Linkpost]

aog8 Jun 2023 0:16 UTC
7 points
0 comments1 min readLW link
(arxiv.org)

Full Au­toma­tion is Un­likely and Un­nec­es­sary for Ex­plo­sive Growth

aog31 May 2023 21:55 UTC
28 points
3 comments5 min readLW link

Model-driven feed­back could am­plify al­ign­ment failures

aog30 Jan 2023 0:00 UTC
21 points
1 comment2 min readLW link

Anal­y­sis: US re­stricts GPU sales to China

aog7 Oct 2022 18:38 UTC
102 points
58 comments5 min readLW link

Git Re-Basin: Merg­ing Models mod­ulo Per­mu­ta­tion Sym­me­tries [Linkpost]

aog14 Sep 2022 8:55 UTC
21 points
0 comments2 min readLW link
(arxiv.org)

Ar­gu­ment against 20% GDP growth from AI within 10 years [Linkpost]

aog12 Sep 2022 4:08 UTC
59 points
20 comments5 min readLW link
(twitter.com)

ML Model At­tri­bu­tion Challenge [Linkpost]

aog30 Aug 2022 19:34 UTC
11 points
0 comments1 min readLW link
(mlmac.io)

Emer­gent Abil­ities of Large Lan­guage Models [Linkpost]

aog10 Aug 2022 18:02 UTC
25 points
2 comments1 min readLW link
(arxiv.org)

Key Papers in Lan­guage Model Safety

aog20 Jun 2022 15:00 UTC
40 points
1 comment22 min readLW link

Yud­kowsky Con­tra Chris­ti­ano on AI Take­off Speeds [Linkpost]

aog5 Apr 2022 2:09 UTC
18 points
0 comments11 min readLW link

[Link] Did AlphaS­tar just click faster?

aog28 Jan 2019 20:23 UTC
4 points
14 comments1 min readLW link