Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
AXRP
Tag
Last edit:
30 Dec 2024 10:23 UTC
by
Dakara
AI X-Risk Research Podcast
is a podcast hosted by Daniel Filan.
See also:
Audio
,
Interviews
Relevant
New
Old
Video/animation: Neel Nanda explains what mechanistic interpretability is
DanielFilan
22 Feb 2023 22:42 UTC
24
points
7
comments
1
min read
LW
link
(youtu.be)
AXRP Episode 32 - Understanding Agency with Jan Kulveit
DanielFilan
30 May 2024 3:50 UTC
20
points
0
comments
53
min read
LW
link
AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan
11 Apr 2024 21:30 UTC
69
points
10
comments
107
min read
LW
link
AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil
DanielFilan
17 Apr 2024 21:42 UTC
12
points
0
comments
65
min read
LW
link
AXRP Episode 29 - Science of Deep Learning with Vikrant Varma
DanielFilan
25 Apr 2024 19:10 UTC
20
points
1
comment
63
min read
LW
link
AXRP Episode 38.3 - Erik Jenner on Learned Look-Ahead
DanielFilan
12 Dec 2024 5:40 UTC
20
points
0
comments
16
min read
LW
link
AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan
1 May 2024 2:50 UTC
25
points
0
comments
79
min read
LW
link
AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan
7 May 2024 3:50 UTC
72
points
4
comments
71
min read
LW
link
AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan
12 Jun 2024 3:30 UTC
34
points
0
comments
56
min read
LW
link
AXRP Episode 34 - AI Evaluations with Beth Barnes
DanielFilan
28 Jul 2024 3:30 UTC
23
points
0
comments
69
min read
LW
link
AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan
29 Sep 2024 5:50 UTC
25
points
0
comments
55
min read
LW
link
AXRP Episode 37 - Jaime Sevilla on Forecasting AI
DanielFilan
4 Oct 2024 21:00 UTC
21
points
3
comments
56
min read
LW
link
AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
DanielFilan
14 Nov 2024 7:00 UTC
14
points
0
comments
12
min read
LW
link
AXRP Episode 38.1 - Alan Chan on Agent Infrastructure
DanielFilan
16 Nov 2024 23:30 UTC
12
points
0
comments
14
min read
LW
link
AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory
DanielFilan
27 Nov 2024 6:30 UTC
34
points
0
comments
10
min read
LW
link
AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment
DanielFilan
1 Dec 2024 6:00 UTC
41
points
0
comments
67
min read
LW
link
AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson
DanielFilan
12 Apr 2023 21:30 UTC
22
points
2
comments
68
min read
LW
link
AXRP Episode 26 - AI Governance with Elizabeth Seger
DanielFilan
26 Nov 2023 23:00 UTC
14
points
0
comments
66
min read
LW
link
AXRP Episode 21 - Interpretability for Engineers with Stephen Casper
DanielFilan
2 May 2023 0:50 UTC
12
points
1
comment
66
min read
LW
link
AXRP Episode 22 - Shard Theory with Quintin Pope
DanielFilan
15 Jun 2023 19:00 UTC
52
points
11
comments
93
min read
LW
link
AXRP announcement: Survey, Store Closing, Patreon
DanielFilan
28 Jun 2023 23:40 UTC
14
points
0
comments
1
min read
LW
link
AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu
DanielFilan
27 Jul 2023 1:50 UTC
22
points
0
comments
72
min read
LW
link
AXRP Episode 24 - Superalignment with Jan Leike
DanielFilan
27 Jul 2023 4:00 UTC
55
points
3
comments
69
min read
LW
link
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan
3 Oct 2023 21:50 UTC
43
points
0
comments
92
min read
LW
link
AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo
DanielFilan
31 Mar 2022 5:20 UTC
24
points
1
comment
48
min read
LW
link
AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy
DanielFilan
5 Apr 2022 23:10 UTC
25
points
10
comments
52
min read
LW
link
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
23 May 2022 5:40 UTC
34
points
1
comment
58
min read
LW
link
AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving
DanielFilan
1 Jul 2022 22:20 UTC
20
points
0
comments
37
min read
LW
link
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
21 Aug 2022 23:50 UTC
16
points
0
comments
35
min read
LW
link
AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong
DanielFilan
3 Sep 2022 23:12 UTC
12
points
1
comment
39
min read
LW
link
AXRP Episode 19 - Mechanistic Interpretability with Neel Nanda
DanielFilan
4 Feb 2023 3:00 UTC
45
points
0
comments
117
min read
LW
link
AXRP: Store, Patreon, Video
DanielFilan
7 Feb 2023 4:50 UTC
12
points
0
comments
1
min read
LW
link
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan
24 Aug 2024 22:30 UTC
21
points
0
comments
74
min read
LW
link
AXRP Episode 11 - Attainable Utility and Power with Alex Turner
DanielFilan
25 Sep 2021 21:10 UTC
19
points
5
comments
53
min read
LW
link
AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell
DanielFilan
8 Jun 2021 23:20 UTC
22
points
1
comment
72
min read
LW
link
AXRP Episode 2 - Learning Human Biases with Rohin Shah
DanielFilan
29 Dec 2020 20:43 UTC
13
points
0
comments
35
min read
LW
link
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
29 Dec 2020 20:41 UTC
12
points
5
comments
34
min read
LW
link
Announcing AXRP, the AI X-risk Research Podcast
DanielFilan
23 Dec 2020 20:00 UTC
54
points
5
comments
1
min read
LW
link
(danielfilan.com)
“Infra-Bayesianism with Vanessa Kosoy” – Watch/Discuss Party
Ben Pace
22 Mar 2021 23:44 UTC
27
points
45
comments
1
min read
LW
link
AXRP Episode 12 - AI Existential Risk with Paul Christiano
DanielFilan
2 Dec 2021 2:20 UTC
38
points
0
comments
126
min read
LW
link
AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant
DanielFilan
24 Jun 2021 22:10 UTC
59
points
2
comments
59
min read
LW
link
AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger
DanielFilan
18 Feb 2021 0:03 UTC
43
points
10
comments
87
min read
LW
link
AXRP Episode 10 - AI’s Future and Impacts with Katja Grace
DanielFilan
23 Jul 2021 22:10 UTC
34
points
2
comments
77
min read
LW
link
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
14 May 2021 3:50 UTC
34
points
6
comments
43
min read
LW
link
AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch
DanielFilan
29 Dec 2020 20:45 UTC
27
points
0
comments
28
min read
LW
link
AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy
DanielFilan
10 Mar 2021 4:30 UTC
35
points
12
comments
36
min read
LW
link
AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra
DanielFilan
28 May 2021 0:20 UTC
24
points
1
comment
67
min read
LW
link
AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes
DanielFilan
8 Apr 2021 21:20 UTC
26
points
3
comments
60
min read
LW
link
No comments.
Back to top