RSS

An­thropic (org)

TagLast edit: Dec 31, 2024, 10:02 PM by ryan_greenblatt

Anthropic is an AI company based in San Francisco. The company is known for developing the Claude AI family and publishing research on AI safety.

Not to be confused with anthropics.

An­thropic’s Core Views on AI Safety

Zac Hatfield-DoddsMar 9, 2023, 4:55 PM
172 points
39 comments2 min readLW link
(www.anthropic.com)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) Feb 15, 2023, 1:56 AM
166 points
31 comments4 min readLW link

Toy Models of Superposition

evhubSep 21, 2022, 11:48 PM
69 points
4 comments5 min readLW link1 review
(transformer-circuits.pub)

Why I’m join­ing Anthropic

evhubJan 5, 2023, 1:12 AM
118 points
4 comments1 min readLW link

Con­crete Rea­sons for Hope about AI

Zac Hatfield-DoddsJan 14, 2023, 1:22 AM
100 points
13 comments1 min readLW link

[Linkpost] Google in­vested $300M in An­thropic in late 2022

AkashFeb 3, 2023, 7:13 PM
73 points
14 comments1 min readLW link
(www.ft.com)

An­thropic’s SoLU (Soft­max Lin­ear Unit)

Joel BurgetJul 4, 2022, 6:38 PM
21 points
1 comment4 min readLW link
(transformer-circuits.pub)

Trans­former Circuits

evhubDec 22, 2021, 9:09 PM
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-DoddsOct 5, 2023, 9:01 PM
288 points
22 comments2 min readLW link1 review
(transformer-circuits.pub)

An­thropic is fur­ther ac­cel­er­at­ing the Arms Race?

sapphireApr 6, 2023, 11:29 PM
82 points
22 comments1 min readLW link
(techcrunch.com)

OMMC An­nounces RIP

Apr 1, 2024, 11:20 PM
189 points
5 comments2 min readLW link

An­thropic’s Cer­tifi­cate of Incorporation

Zach Stein-PerlmanJun 12, 2024, 1:00 PM
115 points
7 comments4 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatterDec 24, 2021, 7:24 AM
12 points
3 comments1 min readLW link
(www.youtube.com)

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

Apr 11, 2023, 5:30 PM
141 points
11 comments1 min readLW link

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-DoddsNov 1, 2023, 6:10 PM
85 points
1 comment4 min readLW link
(www.anthropic.com)

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-DoddsMay 20, 2024, 4:14 AM
30 points
21 comments10 min readLW link
(www.anthropic.com)

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasperMay 21, 2024, 8:15 PM
157 points
16 comments3 min readLW link

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-PerlmanMay 27, 2024, 1:00 PM
199 points
21 comments2 min readLW link

On An­thropic’s Sleeper Agents Paper

ZviJan 17, 2024, 4:10 PM
54 points
5 comments36 min readLW link
(thezvi.wordpress.com)

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhubJan 12, 2024, 11:51 PM
182 points
23 comments2 min readLW link

Vaniver’s thoughts on An­thropic’s RSP

VaniverOct 28, 2023, 9:06 PM
46 points
4 comments3 min readLW link

An­thropic AI made the right call

bhauthApr 15, 2024, 12:39 AM
22 points
20 comments1 min readLW link

On Claude 3.5 Sonnet

ZviJun 24, 2024, 12:00 PM
95 points
14 comments13 min readLW link
(thezvi.wordpress.com)

John Schul­man leaves OpenAI for Anthropic

SodiumAug 6, 2024, 1:23 AM
57 points
0 comments1 min readLW link

An­thropic’s up­dated Re­spon­si­ble Scal­ing Policy

Zac Hatfield-DoddsOct 15, 2024, 4:46 PM
52 points
3 comments3 min readLW link
(www.anthropic.com)

An­thropic rewrote its RSP

Zach Stein-PerlmanOct 15, 2024, 2:25 PM
46 points
19 comments6 min readLW link

An­thropic: Three Sketches of ASL-4 Safety Case Components

Zach Stein-PerlmanNov 6, 2024, 4:00 PM
95 points
33 comments1 min readLW link
(alignment.anthropic.com)

An­thropic CEO calls for RSI

Andrea_MiottiJan 29, 2025, 4:54 PM
31 points
10 comments1 min readLW link
(darioamodei.com)

An­thropic Observations

ZviJul 25, 2023, 12:50 PM
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Fron­tier Model Security

VaniverJul 26, 2023, 4:48 AM
32 points
1 comment3 min readLW link
(www.anthropic.com)

Fron­tier Model Forum

Zach Stein-PerlmanJul 26, 2023, 2:30 PM
27 points
0 comments4 min readLW link
(blog.google)

Ama­zon to in­vest up to $4 billion in Anthropic

Davis_KingsleySep 25, 2023, 2:55 PM
44 points
8 comments1 min readLW link
(twitter.com)

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-DoddsSep 19, 2023, 3:09 PM
83 points
26 comments3 min readLW link1 review
(www.anthropic.com)

A Sum­mary Of An­thropic’s First Paper

Sam RingerDec 30, 2021, 12:48 AM
85 points
1 comment8 min readLW link

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_EvansFeb 26, 2022, 12:46 PM
44 points
3 comments11 min readLW link

Paper: The Ca­pac­ity for Mo­ral Self-Cor­rec­tion in Large Lan­guage Models (An­thropic)

LawrenceCFeb 16, 2023, 7:47 PM
65 points
9 comments1 min readLW link
(arxiv.org)

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

rememberMar 7, 2023, 4:47 PM
46 points
2 comments79 min readLW link
(futureoflife.org)

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:34 PM
17 points
1 comment22 min readLW link
(www.anthropic.com)

Quick Thoughts on Scal­ing Monosemanticity

Joel BurgetMay 23, 2024, 4:22 PM
28 points
1 comment4 min readLW link
(transformer-circuits.pub)

In­de­pen­dent re­search ar­ti­cle an­a­lyz­ing con­sis­tent self-re­ports of ex­pe­rience in ChatGPT and Claude

rifeJan 6, 2025, 5:34 PM
4 points
20 comments1 min readLW link
(awakenmoon.ai)

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

Jul 18, 2023, 4:36 PM
111 points
15 comments6 min readLW link1 review

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZIOct 7, 2023, 11:30 PM
137 points
8 comments4 min readLW link

In­tro­duc­ing the An­thropic Fel­lows Program

Nov 30, 2024, 11:47 PM
26 points
0 comments4 min readLW link
(alignment.anthropic.com)

[Preprint] Pre­train­ing Lan­guage Models with Hu­man Preferences

GiulioFeb 21, 2023, 11:44 AM
12 points
0 comments1 min readLW link
(arxiv.org)

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromemMay 22, 2024, 11:09 AM
28 points
6 comments5 min readLW link

An­thropic teams up with Palan­tir and AWS to sell AI to defense customers

Matrice JacobineNov 9, 2024, 11:50 AM
9 points
0 comments2 min readLW link
(techcrunch.com)

Align­ment Fak­ing in Large Lan­guage Models

Dec 18, 2024, 5:19 PM
478 points
70 comments10 min readLW link

An­thropic—The case for tar­geted regulation

anagumaNov 5, 2024, 7:07 AM
11 points
0 comments2 min readLW link
(www.anthropic.com)

The limited up­side of interpretability

Peter S. ParkNov 15, 2022, 6:46 PM
13 points
11 comments1 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

Dec 1, 2022, 11:11 PM
301 points
33 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann DuboisDec 19, 2022, 10:42 PM
5 points
6 comments1 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

Nov 7, 2023, 5:59 PM
36 points
2 comments2 min readLW link
(arxiv.org)

Rishi Su­nak men­tions “ex­is­ten­tial threats” in talk with OpenAI, Deep­Mind, An­thropic CEOs

May 24, 2023, 9:06 PM
34 points
1 comment1 min readLW link
(www.gov.uk)

An­thropic | Chart­ing a Path to AI Accountability

Gabe MJun 14, 2023, 4:43 AM
34 points
2 comments3 min readLW link
(www.anthropic.com)

Dario Amodei — Machines of Lov­ing Grace

Matrice JacobineOct 11, 2024, 9:43 PM
62 points
26 comments1 min readLW link
(darioamodei.com)

[Question] Has An­thropic checked if Claude fakes al­ign­ment for in­tended val­ues too?

MaloewDec 23, 2024, 12:43 AM
4 points
1 comment1 min readLW link

Can We Pre­dict Per­sua­sive­ness Bet­ter Than An­thropic?

Lennart FinkeAug 4, 2024, 2:05 PM
22 points
5 comments4 min readLW link

AI Aware­ness through In­ter­ac­tion with Blatantly Alien Models

VojtaKovarikJul 28, 2023, 8:41 AM
7 points
5 comments3 min readLW link
No comments.