Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Anthropic (org)
Tag
Last edit:
25 Dec 2021 4:12 UTC
by
Multicore
Anthropic
is an AI organization.
Not to be confused with
anthropics
.
Relevant
New
Old
Anthropic’s Core Views on AI Safety
Zac Hatfield-Dodds
9 Mar 2023 16:55 UTC
172
points
39
comments
2
min read
LW
link
(www.anthropic.com)
My understanding of Anthropic strategy
Swimmer963 (Miranda Dixon-Luinenburg)
15 Feb 2023 1:56 UTC
166
points
31
comments
4
min read
LW
link
Why I’m joining Anthropic
evhub
5 Jan 2023 1:12 UTC
121
points
4
comments
1
min read
LW
link
Toy Models of Superposition
evhub
21 Sep 2022 23:48 UTC
69
points
4
comments
5
min read
LW
link
1
review
(transformer-circuits.pub)
Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
14 Jan 2023 1:22 UTC
100
points
13
comments
1
min read
LW
link
[Linkpost] Google invested $300M in Anthropic in late 2022
Akash
3 Feb 2023 19:13 UTC
73
points
14
comments
1
min read
LW
link
(www.ft.com)
Anthropic’s SoLU (Softmax Linear Unit)
Joel Burget
4 Jul 2022 18:38 UTC
21
points
1
comment
4
min read
LW
link
(transformer-circuits.pub)
Transformer Circuits
evhub
22 Dec 2021 21:09 UTC
144
points
4
comments
3
min read
LW
link
(transformer-circuits.pub)
Mechanistic Interpretability for the MLP Layers (rough early thoughts)
MadHatter
24 Dec 2021 7:24 UTC
12
points
3
comments
1
min read
LW
link
(www.youtube.com)
Anthropic is further accelerating the Arms Race?
sapphire
6 Apr 2023 23:29 UTC
82
points
22
comments
1
min read
LW
link
(techcrunch.com)
OMMC Announces RIP
Adam Scholl
and
aysja
1 Apr 2024 23:20 UTC
186
points
5
comments
2
min read
LW
link
Anthropic’s Certificate of Incorporation
Zach Stein-Perlman
12 Jun 2024 13:00 UTC
115
points
4
comments
4
min read
LW
link
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
5 Oct 2023 21:01 UTC
287
points
21
comments
2
min read
LW
link
(transformer-circuits.pub)
On Anthropic’s Sleeper Agents Paper
Zvi
17 Jan 2024 16:10 UTC
54
points
5
comments
36
min read
LW
link
(thezvi.wordpress.com)
Podcast Transcript: Daniela and Dario Amodei on Anthropic
remember
7 Mar 2023 16:47 UTC
46
points
2
comments
79
min read
LW
link
(futureoflife.org)
Maybe Anthropic’s Long-Term Benefit Trust is powerless
Zach Stein-Perlman
27 May 2024 13:00 UTC
199
points
21
comments
2
min read
LW
link
Anthropic: Core Views on AI Safety: When, Why, What, and How
jonmenaster
9 Mar 2023 17:34 UTC
17
points
1
comment
22
min read
LW
link
(www.anthropic.com)
How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain_Evans
26 Feb 2022 12:46 UTC
44
points
3
comments
11
min read
LW
link
A Summary Of Anthropic’s First Paper
Sam Ringer
30 Dec 2021 0:48 UTC
85
points
1
comment
8
min read
LW
link
Request to AGI organizations: Share your views on pausing AI progress
Akash
and
simeon_c
11 Apr 2023 17:30 UTC
141
points
11
comments
1
min read
LW
link
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
1 Nov 2023 18:10 UTC
85
points
1
comment
4
min read
LW
link
(www.anthropic.com)
Introducing Alignment Stress-Testing at Anthropic
evhub
12 Jan 2024 23:51 UTC
182
points
23
comments
2
min read
LW
link
Vaniver’s thoughts on Anthropic’s RSP
Vaniver
28 Oct 2023 21:06 UTC
46
points
4
comments
3
min read
LW
link
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper
21 May 2024 20:15 UTC
157
points
16
comments
3
min read
LW
link
Anthropic AI made the right call
bhauth
15 Apr 2024 0:39 UTC
22
points
20
comments
1
min read
LW
link
John Schulman leaves OpenAI for Anthropic
Sodium
6 Aug 2024 1:23 UTC
57
points
0
comments
1
min read
LW
link
Anthropic Observations
Zvi
25 Jul 2023 12:50 UTC
104
points
1
comment
10
min read
LW
link
(thezvi.wordpress.com)
Frontier Model Security
Vaniver
26 Jul 2023 4:48 UTC
31
points
1
comment
3
min read
LW
link
(www.anthropic.com)
Frontier Model Forum
Zach Stein-Perlman
26 Jul 2023 14:30 UTC
27
points
0
comments
4
min read
LW
link
(blog.google)
On Claude 3.5 Sonnet
Zvi
24 Jun 2024 12:00 UTC
95
points
14
comments
13
min read
LW
link
(thezvi.wordpress.com)
Amazon to invest up to $4 billion in Anthropic
Davis_Kingsley
25 Sep 2023 14:55 UTC
44
points
8
comments
1
min read
LW
link
(twitter.com)
Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
19 Sep 2023 15:09 UTC
83
points
23
comments
3
min read
LW
link
(www.anthropic.com)
Anthropic rewrote its RSP
Zach Stein-Perlman
15 Oct 2024 14:25 UTC
39
points
19
comments
6
min read
LW
link
Anthropic’s updated Responsible Scaling Policy
Zac Hatfield-Dodds
15 Oct 2024 16:46 UTC
51
points
3
comments
3
min read
LW
link
(www.anthropic.com)
Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds
20 May 2024 4:14 UTC
30
points
21
comments
10
min read
LW
link
(www.anthropic.com)
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
LawrenceC
16 Feb 2023 19:47 UTC
65
points
9
comments
1
min read
LW
link
(arxiv.org)
Cicadas, Anthropic, and the bilateral alignment problem
kromem
22 May 2024 11:09 UTC
28
points
6
comments
5
min read
LW
link
Quick Thoughts on Scaling Monosemanticity
Joel Burget
23 May 2024 16:22 UTC
28
points
1
comment
4
min read
LW
link
(transformer-circuits.pub)
Can We Predict Persuasiveness Better Than Anthropic?
Lennart Finke
4 Aug 2024 14:05 UTC
22
points
5
comments
4
min read
LW
link
Dario Amodei — Machines of Loving Grace
Matrice Jacobine
11 Oct 2024 21:43 UTC
61
points
26
comments
1
min read
LW
link
(darioamodei.com)
Anthropic—The case for targeted regulation
anaguma
5 Nov 2024 7:07 UTC
11
points
0
comments
2
min read
LW
link
(www.anthropic.com)
Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation
Soroush Pour
,
rusheb
,
Quentin FEUILLADE--MONTIXI
,
Arush
and
scasper
7 Nov 2023 17:59 UTC
36
points
2
comments
2
min read
LW
link
(arxiv.org)
Rishi Sunak mentions “existential threats” in talk with OpenAI, DeepMind, Anthropic CEOs
Arjun Panickssery
,
Baldassare Castiglione
and
Cleo Nardo
24 May 2023 21:06 UTC
34
points
1
comment
1
min read
LW
link
(www.gov.uk)
Anthropic | Charting a Path to AI Accountability
Gabe M
14 Jun 2023 4:43 UTC
34
points
2
comments
3
min read
LW
link
(www.anthropic.com)
AI Awareness through Interaction with Blatantly Alien Models
VojtaKovarik
28 Jul 2023 8:41 UTC
7
points
5
comments
3
min read
LW
link
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
and
Ethan Perez
18 Jul 2023 16:36 UTC
111
points
14
comments
6
min read
LW
link
Comparing Anthropic’s Dictionary Learning to Ours
Robert_AIZI
7 Oct 2023 23:30 UTC
137
points
8
comments
4
min read
LW
link
The limited upside of interpretability
Peter S. Park
15 Nov 2022 18:46 UTC
13
points
11
comments
1
min read
LW
link
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
and
Eliezer Yudkowsky
1 Dec 2022 23:11 UTC
301
points
33
comments
2
min read
LW
link
[Question]
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
19 Dec 2022 22:42 UTC
5
points
6
comments
1
min read
LW
link
[Preprint] Pretraining Language Models with Human Preferences
Giulio
21 Feb 2023 11:44 UTC
12
points
0
comments
1
min read
LW
link
(arxiv.org)
No comments.
Back to top