RSS

An­thropic (org)

TagLast edit: 25 Dec 2021 4:12 UTC by Multicore

Anthropic is an AI organization.

Not to be confused with anthropics.

An­thropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC
172 points
39 comments2 min readLW link
(www.anthropic.com)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC
166 points
31 comments4 min readLW link

Why I’m join­ing Anthropic

evhub5 Jan 2023 1:12 UTC
121 points
4 comments1 min readLW link

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC
69 points
4 comments5 min readLW link1 review
(transformer-circuits.pub)

Con­crete Rea­sons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC
100 points
13 comments1 min readLW link

[Linkpost] Google in­vested $300M in An­thropic in late 2022

Akash3 Feb 2023 19:13 UTC
73 points
14 comments1 min readLW link
(www.ft.com)

An­thropic’s SoLU (Soft­max Lin­ear Unit)

Joel Burget4 Jul 2022 18:38 UTC
21 points
1 comment4 min readLW link
(transformer-circuits.pub)

Trans­former Circuits

evhub22 Dec 2021 21:09 UTC
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatter24 Dec 2021 7:24 UTC
12 points
3 comments1 min readLW link
(www.youtube.com)

An­thropic is fur­ther ac­cel­er­at­ing the Arms Race?

sapphire6 Apr 2023 23:29 UTC
82 points
22 comments1 min readLW link
(techcrunch.com)

OMMC An­nounces RIP

1 Apr 2024 23:20 UTC
186 points
5 comments2 min readLW link

An­thropic’s Cer­tifi­cate of Incorporation

Zach Stein-Perlman12 Jun 2024 13:00 UTC
115 points
4 comments4 min readLW link

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC
287 points
21 comments2 min readLW link
(transformer-circuits.pub)

On An­thropic’s Sleeper Agents Paper

Zvi17 Jan 2024 16:10 UTC
54 points
5 comments36 min readLW link
(thezvi.wordpress.com)

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC
46 points
2 comments79 min readLW link
(futureoflife.org)

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-Perlman27 May 2024 13:00 UTC
199 points
21 comments2 min readLW link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC
17 points
1 comment22 min readLW link
(www.anthropic.com)

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_Evans26 Feb 2022 12:46 UTC
44 points
3 comments11 min readLW link

A Sum­mary Of An­thropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC
85 points
1 comment8 min readLW link

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

11 Apr 2023 17:30 UTC
141 points
11 comments1 min readLW link

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC
85 points
1 comment4 min readLW link
(www.anthropic.com)

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhub12 Jan 2024 23:51 UTC
182 points
23 comments2 min readLW link

Vaniver’s thoughts on An­thropic’s RSP

Vaniver28 Oct 2023 21:06 UTC
46 points
4 comments3 min readLW link

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasper21 May 2024 20:15 UTC
157 points
16 comments3 min readLW link

An­thropic AI made the right call

bhauth15 Apr 2024 0:39 UTC
22 points
20 comments1 min readLW link

John Schul­man leaves OpenAI for Anthropic

Sodium6 Aug 2024 1:23 UTC
57 points
0 comments1 min readLW link

An­thropic Observations

Zvi25 Jul 2023 12:50 UTC
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Fron­tier Model Security

Vaniver26 Jul 2023 4:48 UTC
31 points
1 comment3 min readLW link
(www.anthropic.com)

Fron­tier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC
27 points
0 comments4 min readLW link
(blog.google)

On Claude 3.5 Sonnet

Zvi24 Jun 2024 12:00 UTC
95 points
14 comments13 min readLW link
(thezvi.wordpress.com)

Ama­zon to in­vest up to $4 billion in Anthropic

Davis_Kingsley25 Sep 2023 14:55 UTC
44 points
8 comments1 min readLW link
(twitter.com)

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC
83 points
23 comments3 min readLW link
(www.anthropic.com)

An­thropic rewrote its RSP

Zach Stein-Perlman15 Oct 2024 14:25 UTC
39 points
19 comments6 min readLW link

An­thropic’s up­dated Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds15 Oct 2024 16:46 UTC
51 points
3 comments3 min readLW link
(www.anthropic.com)

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC
30 points
21 comments10 min readLW link
(www.anthropic.com)

Paper: The Ca­pac­ity for Mo­ral Self-Cor­rec­tion in Large Lan­guage Models (An­thropic)

LawrenceC16 Feb 2023 19:47 UTC
65 points
9 comments1 min readLW link
(arxiv.org)

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromem22 May 2024 11:09 UTC
28 points
6 comments5 min readLW link

Quick Thoughts on Scal­ing Monosemanticity

Joel Burget23 May 2024 16:22 UTC
28 points
1 comment4 min readLW link
(transformer-circuits.pub)

Can We Pre­dict Per­sua­sive­ness Bet­ter Than An­thropic?

Lennart Finke4 Aug 2024 14:05 UTC
22 points
5 comments4 min readLW link

Dario Amodei — Machines of Lov­ing Grace

Matrice Jacobine11 Oct 2024 21:43 UTC
61 points
26 comments1 min readLW link
(darioamodei.com)

An­thropic—The case for tar­geted regulation

anaguma5 Nov 2024 7:07 UTC
11 points
0 comments2 min readLW link
(www.anthropic.com)

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
36 points
2 comments2 min readLW link
(arxiv.org)

Rishi Su­nak men­tions “ex­is­ten­tial threats” in talk with OpenAI, Deep­Mind, An­thropic CEOs

24 May 2023 21:06 UTC
34 points
1 comment1 min readLW link
(www.gov.uk)

An­thropic | Chart­ing a Path to AI Accountability

Gabe M14 Jun 2023 4:43 UTC
34 points
2 comments3 min readLW link
(www.anthropic.com)

AI Aware­ness through In­ter­ac­tion with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC
7 points
5 comments3 min readLW link

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
111 points
14 comments6 min readLW link

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZI7 Oct 2023 23:30 UTC
137 points
8 comments4 min readLW link

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
301 points
33 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

[Preprint] Pre­train­ing Lan­guage Models with Hu­man Preferences

Giulio21 Feb 2023 11:44 UTC
12 points
0 comments1 min readLW link
(arxiv.org)
No comments.