aog

Karma: 1,572

Research Priorities for Hardware-Enabled Mechanisms (HEMs)

aogApr 30, 2025, 5:43 PM

16 points

2 comments15 min readLW link

(www.longview.org)

aog’s Shortform

aogApr 19, 2025, 10:07 PM

6 points

21 comments LW link

Benchmarking LLM Agents on Kaggle Competitions

aogMar 22, 2024, 1:09 PM

15 points

4 comments5 min readLW link

Adversarial Robustness Could Help Prevent Catastrophic Misuse

aogDec 11, 2023, 7:12 PM

30 points

18 comments9 min readLW link

Unsupervised Methods for Concept Discovery in AlphaZero

aogOct 26, 2023, 7:05 PM

9 points

0 comments1 min readLW link

(arxiv.org)

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

Sep 13, 2023, 6:03 PM

15 points

1 comment5 min readLW link

(newsletter.mlsafety.org)

Hoodwinked: Evaluating Deception Capabilities in Large Language Models

aogAug 25, 2023, 7:39 PM

25 points

3 comments3 min readLW link

Learning Transformer Programs [Linkpost]

aogJun 8, 2023, 12:16 AM

7 points

0 comments1 min readLW link

(arxiv.org)

Full Automation is Unlikely and Unnecessary for Explosive Growth

aogMay 31, 2023, 9:55 PM

28 points

3 comments5 min readLW link

Model-driven feedback could amplify alignment failures

aogJan 30, 2023, 12:00 AM

21 points

1 comment2 min readLW link

Analysis: US restricts GPU sales to China

aogOct 7, 2022, 6:38 PM

102 points

58 comments5 min readLW link

Git Re-Basin: Merging Models modulo Permutation Symmetries [Linkpost]

aogSep 14, 2022, 8:55 AM

21 points

0 comments2 min readLW link

(arxiv.org)

Argument against 20% GDP growth from AI within 10 years [Linkpost]

aogSep 12, 2022, 4:08 AM

59 points

20 comments5 min readLW link

(twitter.com)

ML Model Attribution Challenge [Linkpost]

aogAug 30, 2022, 7:34 PM

11 points

0 comments1 min readLW link

(mlmac.io)

Emergent Abilities of Large Language Models [Linkpost]

aogAug 10, 2022, 6:02 PM

25 points

2 comments1 min readLW link

(arxiv.org)

Key Papers in Language Model Safety

aogJun 20, 2022, 3:00 PM

40 points

1 comment22 min readLW link

Yudkowsky Contra Christiano on AI Takeoff Speeds [Linkpost]

aogApr 5, 2022, 2:09 AM

18 points

0 comments11 min readLW link

[Link] Did AlphaStar just click faster?

aogJan 28, 2019, 8:23 PM

4 points

14 comments1 min readLW link