Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Thane Ruthenis
Karma:
4,135
All
Posts
Comments
New
Top
Old
Page
1
Towards the Operationalization of Philosophy & Wisdom
Thane Ruthenis
28 Oct 2024 19:45 UTC
20
points
2
comments
33
min read
LW
link
(aiimpacts.org)
Thane Ruthenis’s Shortform
Thane Ruthenis
13 Sep 2024 20:52 UTC
7
points
4
comments
1
min read
LW
link
A Crisper Explanation of Simulacrum Levels
Thane Ruthenis
23 Dec 2023 22:13 UTC
86
points
13
comments
13
min read
LW
link
Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis
22 Dec 2023 20:19 UTC
74
points
14
comments
6
min read
LW
link
Most People Don’t Realize We Have No Idea How Our AIs Work
Thane Ruthenis
21 Dec 2023 20:02 UTC
158
points
42
comments
1
min read
LW
link
How Would an Utopia-Maximizer Look Like?
Thane Ruthenis
20 Dec 2023 20:01 UTC
31
points
23
comments
10
min read
LW
link
Don’t Share Information Exfohazardous on Others’ AI-Risk Models
Thane Ruthenis
19 Dec 2023 20:09 UTC
67
points
11
comments
1
min read
LW
link
The Shortest Path Between Scylla and Charybdis
Thane Ruthenis
18 Dec 2023 20:08 UTC
50
points
8
comments
5
min read
LW
link
A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans
Thane Ruthenis
17 Dec 2023 20:28 UTC
29
points
7
comments
11
min read
LW
link
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
Thane Ruthenis
16 Dec 2023 20:08 UTC
180
points
34
comments
5
min read
LW
link
Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis
15 Dec 2023 20:16 UTC
114
points
155
comments
8
min read
LW
link
Hands-On Experience Is Not Magic
Thane Ruthenis
27 May 2023 16:57 UTC
21
points
14
comments
5
min read
LW
link
A Case for the Least Forgiving Take On Alignment
Thane Ruthenis
2 May 2023 21:34 UTC
100
points
84
comments
22
min read
LW
link
World-Model Interpretability Is All We Need
Thane Ruthenis
14 Jan 2023 19:37 UTC
35
points
22
comments
21
min read
LW
link
Internal Interfaces Are a High-Priority Interpretability Target
Thane Ruthenis
29 Dec 2022 17:49 UTC
26
points
6
comments
7
min read
LW
link
In Defense of Wrapper-Minds
Thane Ruthenis
28 Dec 2022 18:28 UTC
24
points
38
comments
3
min read
LW
link
Accurate Models of AI Risk Are Hyperexistential Exfohazards
Thane Ruthenis
25 Dec 2022 16:50 UTC
31
points
38
comments
9
min read
LW
link
Corrigibility Via Thought-Process Deference
Thane Ruthenis
24 Nov 2022 17:06 UTC
17
points
5
comments
9
min read
LW
link
Value Formation: An Overarching Model
Thane Ruthenis
15 Nov 2022 17:16 UTC
34
points
20
comments
34
min read
LW
link
Greed Is the Root of This Evil
Thane Ruthenis
13 Oct 2022 20:40 UTC
18
points
7
comments
8
min read
LW
link
Back to top
Next