Anthropic (org)

TagLast edit: Dec 31, 2024, 10:02 PM by ryan_greenblatt

Anthropic is an AI company based in San Francisco. The company is known for developing the Claude AI family and publishing research on AI safety.

Not to be confused with anthropics.

Anthropic’s Core Views on AI Safety

Zac Hatfield-DoddsMar 9, 2023, 4:55 PM

172 points

39 comments2 min readLW link

(www.anthropic.com)

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) Feb 15, 2023, 1:56 AM

166 points

31 comments4 min readLW link

Why I’m joining Anthropic

evhubJan 5, 2023, 1:12 AM

118 points

4 comments2 min readLW link

Toy Models of Superposition

evhubSep 21, 2022, 11:48 PM

69 points

4 comments5 min readLW link 1 review

(transformer-circuits.pub)

Concrete Reasons for Hope about AI

Zac Hatfield-DoddsJan 14, 2023, 1:22 AM

101 points

13 comments1 min readLW link

[Linkpost] Google invested $300M in Anthropic in late 2022

Orpheus16Feb 3, 2023, 7:13 PM

73 points

14 comments1 min readLW link

(www.ft.com)

Anthropic’s SoLU (Softmax Linear Unit)

Joel BurgetJul 4, 2022, 6:38 PM

21 points

1 comment4 min readLW link

(transformer-circuits.pub)

Transformer Circuits

evhubDec 22, 2021, 9:09 PM

144 points

4 comments3 min readLW link

(transformer-circuits.pub)

Anthropic is further accelerating the Arms Race?

sapphireApr 6, 2023, 11:29 PM

82 points

22 comments1 min readLW link

(techcrunch.com)

OMMC Announces RIP

Adam Scholl and aysja

Apr 1, 2024, 11:20 PM

189 points

5 comments2 min readLW link

Anthropic’s Certificate of Incorporation

Zach Stein-PerlmanJun 12, 2024, 1:00 PM

115 points

7 comments4 min readLW link

Mechanistic Interpretability for the MLP Layers (rough early thoughts)

MadHatterDec 24, 2021, 7:24 AM

12 points

3 comments1 min readLW link

(www.youtube.com)

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-DoddsOct 5, 2023, 9:01 PM

288 points

22 comments2 min readLW link 1 review

(transformer-circuits.pub)

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-DoddsMay 20, 2024, 4:14 AM

30 points

21 comments10 min readLW link

(www.anthropic.com)

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasperMay 21, 2024, 8:15 PM

157 points

16 comments3 min readLW link

Maybe Anthropic’s Long-Term Benefit Trust is powerless

Zach Stein-PerlmanMay 27, 2024, 1:00 PM

201 points

21 comments2 min readLW link

On Anthropic’s Sleeper Agents Paper

ZviJan 17, 2024, 4:10 PM

54 points

5 comments36 min readLW link

(thezvi.wordpress.com)

Introducing Alignment Stress-Testing at Anthropic

evhubJan 12, 2024, 11:51 PM

182 points

23 comments2 min readLW link

Vaniver’s thoughts on Anthropic’s RSP

VaniverOct 28, 2023, 9:06 PM

46 points

4 comments3 min readLW link

Anthropic AI made the right call

bhauthApr 15, 2024, 12:39 AM

22 points

20 comments1 min readLW link

On Claude 3.5 Sonnet

ZviJun 24, 2024, 12:00 PM

95 points

14 comments13 min readLW link

(thezvi.wordpress.com)

John Schulman leaves OpenAI for Anthropic [and then left Anthropic again for Thinking Machines]

SodiumAug 6, 2024, 1:23 AM

57 points

0 comments1 min readLW link

Anthropic releases Claude 3.7 Sonnet with extended thinking mode

LawrenceCFeb 24, 2025, 7:32 PM

88 points

8 comments4 min readLW link

(www.anthropic.com)

Anthropic’s updated Responsible Scaling Policy

Zac Hatfield-DoddsOct 15, 2024, 4:46 PM

52 points

3 comments3 min readLW link

(www.anthropic.com)

Anthropic rewrote its RSP

Zach Stein-PerlmanOct 15, 2024, 2:25 PM

46 points

19 comments6 min readLW link

Anthropic: Three Sketches of ASL-4 Safety Case Components

Zach Stein-PerlmanNov 6, 2024, 4:00 PM

95 points

33 comments1 min readLW link

(alignment.anthropic.com)

Anthropic, and taking “technical philosophy” more seriously

RaemonMar 13, 2025, 1:48 AM

123 points

29 comments11 min readLW link

Anthropic CEO calls for RSI

Andrea_MiottiJan 29, 2025, 4:54 PM

30 points

10 comments1 min readLW link

(darioamodei.com)

Request to AGI organizations: Share your views on pausing AI progress

Orpheus16 and simeon_c

Apr 11, 2023, 5:30 PM

141 points

11 comments1 min readLW link

Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy

Zac Hatfield-DoddsNov 1, 2023, 6:10 PM

85 points

1 comment4 min readLW link

(www.anthropic.com)

Anthropic Observations

ZviJul 25, 2023, 12:50 PM

104 points

1 comment10 min readLW link

(thezvi.wordpress.com)

Frontier Model Security

VaniverJul 26, 2023, 4:48 AM

32 points

1 comment3 min readLW link

(www.anthropic.com)

Frontier Model Forum

Zach Stein-PerlmanJul 26, 2023, 2:30 PM

27 points

0 comments4 min readLW link

(blog.google)

Amazon to invest up to $4 billion in Anthropic

Davis_KingsleySep 25, 2023, 2:55 PM

44 points

8 comments1 min readLW link

(twitter.com)

Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust

Zac Hatfield-DoddsSep 19, 2023, 3:09 PM

83 points

26 comments3 min readLW link 1 review

(www.anthropic.com)

A Summary Of Anthropic’s First Paper

Sam RingerDec 30, 2021, 12:48 AM

85 points

1 comment8 min readLW link

How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?

Owain_EvansFeb 26, 2022, 12:46 PM

44 points

3 comments11 min readLW link

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceCFeb 16, 2023, 7:47 PM

65 points

9 comments1 min readLW link

(arxiv.org)

Podcast Transcript: Daniela and Dario Amodei on Anthropic

rememberMar 7, 2023, 4:47 PM

46 points

2 comments79 min readLW link

(futureoflife.org)

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:34 PM

17 points

1 comment22 min readLW link

(www.anthropic.com)

AI Awareness through Interaction with Blatantly Alien Models

VojtaKovarikJul 28, 2023, 8:41 AM

7 points

5 comments3 min readLW link

Quick Thoughts on Scaling Monosemanticity

Joel BurgetMay 23, 2024, 4:22 PM

28 points

1 comment4 min readLW link

(transformer-circuits.pub)

Introducing the Anthropic Fellows Program

Miranda Zhang and Ethan Perez

Nov 30, 2024, 11:47 PM

26 points

0 comments4 min readLW link

(alignment.anthropic.com)

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman and Ethan Perez

Jul 18, 2023, 4:36 PM

111 points

15 comments6 min readLW link 1 review

Comparing Anthropic’s Dictionary Learning to Ours

Robert_AIZIOct 7, 2023, 11:30 PM

137 points

8 comments4 min readLW link

Anthropic teams up with Palantir and AWS to sell AI to defense customers

Matrice JacobineNov 9, 2024, 11:50 AM

9 points

0 comments2 min readLW link

(techcrunch.com)

[Preprint] Pretraining Language Models with Human Preferences

GiulioFeb 21, 2023, 11:44 AM

12 points

0 comments1 min readLW link

(arxiv.org)

Cicadas, Anthropic, and the bilateral alignment problem

kromemMay 22, 2024, 11:09 AM

28 points

6 comments5 min readLW link

Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman and Buck

Dec 18, 2024, 5:19 PM

483 points

74 comments10 min readLW link

Anthropic—The case for targeted regulation

anagumaNov 5, 2024, 7:07 AM

11 points

0 comments2 min readLW link

(www.anthropic.com)

Dario Amodei — Machines of Loving Grace

Matrice JacobineOct 11, 2024, 9:43 PM

63 points

26 comments1 min readLW link

(darioamodei.com)

The limited upside of interpretability

Peter S. ParkNov 15, 2022, 6:46 PM

13 points

11 comments1 min readLW link

Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

rifeJan 6, 2025, 5:34 PM

4 points

20 comments1 min readLW link

(awakenmoon.ai)

A challenge for AGI organizations, and a challenge for readers

Rob Bensinger and Eliezer Yudkowsky

Dec 1, 2022, 11:11 PM

302 points

33 comments2 min readLW link

[Question] Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann DuboisDec 19, 2022, 10:42 PM

5 points

6 comments1 min readLW link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush and scasper

Nov 7, 2023, 5:59 PM

38 points

2 comments2 min readLW link

(arxiv.org)

Rishi Sunak mentions “existential threats” in talk with OpenAI, DeepMind, Anthropic CEOs

Arjun Panickssery, Baldassare Castiglione and Cleo Nardo

May 24, 2023, 9:06 PM

34 points

1 comment1 min readLW link

(www.gov.uk)

Anthropic | Charting a Path to AI Accountability

Gabe MJun 14, 2023, 4:43 AM

34 points

2 comments3 min readLW link

(www.anthropic.com)

[Question] Has Anthropic checked if Claude fakes alignment for intended values too?

MaloewDec 23, 2024, 12:43 AM

4 points

1 comment1 min readLW link

Can We Predict Persuasiveness Better Than Anthropic?

Lennart FinkeAug 4, 2024, 2:05 PM

22 points

5 comments4 min readLW link

Sea Change

Charlie SandersFeb 18, 2025, 6:03 AM

−2 points

2 comments5 min readLW link

(www.dailymicrofiction.com)

No comments.

An­thropic (org)

Anthropic (org)