RSS

An­thropic (org)

TagLast edit: 31 Dec 2024 22:02 UTC by ryan_greenblatt

Anthropic is an AI company based in San Francisco. The company is known for developing the Claude AI family and publishing research on AI safety.

Not to be confused with anthropics.

An­thropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC
172 points
39 comments2 min readLW link
(www.anthropic.com)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC
166 points
31 comments4 min readLW link

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC
69 points
4 comments5 min readLW link1 review
(transformer-circuits.pub)

Why I’m join­ing Anthropic

evhub5 Jan 2023 1:12 UTC
118 points
4 comments1 min readLW link

Con­crete Rea­sons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC
100 points
13 comments1 min readLW link

[Linkpost] Google in­vested $300M in An­thropic in late 2022

Akash3 Feb 2023 19:13 UTC
73 points
14 comments1 min readLW link
(www.ft.com)

An­thropic’s SoLU (Soft­max Lin­ear Unit)

Joel Burget4 Jul 2022 18:38 UTC
21 points
1 comment4 min readLW link
(transformer-circuits.pub)

Trans­former Circuits

evhub22 Dec 2021 21:09 UTC
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC
288 points
22 comments2 min readLW link1 review
(transformer-circuits.pub)

An­thropic is fur­ther ac­cel­er­at­ing the Arms Race?

sapphire6 Apr 2023 23:29 UTC
82 points
22 comments1 min readLW link
(techcrunch.com)

OMMC An­nounces RIP

1 Apr 2024 23:20 UTC
189 points
5 comments2 min readLW link

An­thropic’s Cer­tifi­cate of Incorporation

Zach Stein-Perlman12 Jun 2024 13:00 UTC
115 points
7 comments4 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatter24 Dec 2021 7:24 UTC
12 points
3 comments1 min readLW link
(www.youtube.com)

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

11 Apr 2023 17:30 UTC
141 points
11 comments1 min readLW link

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC
85 points
1 comment4 min readLW link
(www.anthropic.com)

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC
30 points
21 comments10 min readLW link
(www.anthropic.com)

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasper21 May 2024 20:15 UTC
157 points
16 comments3 min readLW link

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-Perlman27 May 2024 13:00 UTC
199 points
21 comments2 min readLW link

On An­thropic’s Sleeper Agents Paper

Zvi17 Jan 2024 16:10 UTC
54 points
5 comments36 min readLW link
(thezvi.wordpress.com)

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhub12 Jan 2024 23:51 UTC
182 points
23 comments2 min readLW link

Vaniver’s thoughts on An­thropic’s RSP

Vaniver28 Oct 2023 21:06 UTC
46 points
4 comments3 min readLW link

An­thropic AI made the right call

bhauth15 Apr 2024 0:39 UTC
22 points
20 comments1 min readLW link

On Claude 3.5 Sonnet

Zvi24 Jun 2024 12:00 UTC
95 points
14 comments13 min readLW link
(thezvi.wordpress.com)

John Schul­man leaves OpenAI for Anthropic

Sodium6 Aug 2024 1:23 UTC
57 points
0 comments1 min readLW link

An­thropic’s up­dated Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds15 Oct 2024 16:46 UTC
52 points
3 comments3 min readLW link
(www.anthropic.com)

An­thropic rewrote its RSP

Zach Stein-Perlman15 Oct 2024 14:25 UTC
46 points
19 comments6 min readLW link

An­thropic: Three Sketches of ASL-4 Safety Case Components

Zach Stein-Perlman6 Nov 2024 16:00 UTC
95 points
33 comments1 min readLW link
(alignment.anthropic.com)

An­thropic CEO calls for RSI

Andrea_Miotti29 Jan 2025 16:54 UTC
22 points
3 comments1 min readLW link
(darioamodei.com)

An­thropic Observations

Zvi25 Jul 2023 12:50 UTC
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Fron­tier Model Security

Vaniver26 Jul 2023 4:48 UTC
32 points
1 comment3 min readLW link
(www.anthropic.com)

Fron­tier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC
27 points
0 comments4 min readLW link
(blog.google)

Ama­zon to in­vest up to $4 billion in Anthropic

Davis_Kingsley25 Sep 2023 14:55 UTC
44 points
8 comments1 min readLW link
(twitter.com)

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC
83 points
26 comments3 min readLW link1 review
(www.anthropic.com)

A Sum­mary Of An­thropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC
85 points
1 comment8 min readLW link

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_Evans26 Feb 2022 12:46 UTC
44 points
3 comments11 min readLW link

Paper: The Ca­pac­ity for Mo­ral Self-Cor­rec­tion in Large Lan­guage Models (An­thropic)

LawrenceC16 Feb 2023 19:47 UTC
65 points
9 comments1 min readLW link
(arxiv.org)

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC
46 points
2 comments79 min readLW link
(futureoflife.org)

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC
17 points
1 comment22 min readLW link
(www.anthropic.com)

Quick Thoughts on Scal­ing Monosemanticity

Joel Burget23 May 2024 16:22 UTC
28 points
1 comment4 min readLW link
(transformer-circuits.pub)

In­de­pen­dent re­search ar­ti­cle an­a­lyz­ing con­sis­tent self-re­ports of ex­pe­rience in ChatGPT and Claude

rife6 Jan 2025 17:34 UTC
4 points
20 comments1 min readLW link
(awakenmoon.ai)

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
111 points
15 comments6 min readLW link1 review

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZI7 Oct 2023 23:30 UTC
137 points
8 comments4 min readLW link

In­tro­duc­ing the An­thropic Fel­lows Program

30 Nov 2024 23:47 UTC
26 points
0 comments4 min readLW link
(alignment.anthropic.com)

[Preprint] Pre­train­ing Lan­guage Models with Hu­man Preferences

Giulio21 Feb 2023 11:44 UTC
12 points
0 comments1 min readLW link
(arxiv.org)

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromem22 May 2024 11:09 UTC
28 points
6 comments5 min readLW link

An­thropic teams up with Palan­tir and AWS to sell AI to defense customers

Matrice Jacobine9 Nov 2024 11:50 UTC
9 points
0 comments2 min readLW link
(techcrunch.com)

Align­ment Fak­ing in Large Lan­guage Models

18 Dec 2024 17:19 UTC
476 points
68 comments10 min readLW link

An­thropic—The case for tar­geted regulation

anaguma5 Nov 2024 7:07 UTC
11 points
0 comments2 min readLW link
(www.anthropic.com)

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
301 points
33 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
36 points
2 comments2 min readLW link
(arxiv.org)

Rishi Su­nak men­tions “ex­is­ten­tial threats” in talk with OpenAI, Deep­Mind, An­thropic CEOs

24 May 2023 21:06 UTC
34 points
1 comment1 min readLW link
(www.gov.uk)

An­thropic | Chart­ing a Path to AI Accountability

Gabe M14 Jun 2023 4:43 UTC
34 points
2 comments3 min readLW link
(www.anthropic.com)

Dario Amodei — Machines of Lov­ing Grace

Matrice Jacobine11 Oct 2024 21:43 UTC
62 points
26 comments1 min readLW link
(darioamodei.com)

[Question] Has An­thropic checked if Claude fakes al­ign­ment for in­tended val­ues too?

Maloew23 Dec 2024 0:43 UTC
4 points
1 comment1 min readLW link

Can We Pre­dict Per­sua­sive­ness Bet­ter Than An­thropic?

Lennart Finke4 Aug 2024 14:05 UTC
22 points
5 comments4 min readLW link

AI Aware­ness through In­ter­ac­tion with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC
7 points
5 comments3 min readLW link
No comments.