Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Thane Ruthenis
Karma:
3,507
All
Posts
Comments
New
Top
Old
Page
1
A Crisper Explanation of Simulacrum Levels
Thane Ruthenis
23 Dec 2023 22:13 UTC
83
points
13
comments
13
min read
LW
link
Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis
22 Dec 2023 20:19 UTC
71
points
13
comments
6
min read
LW
link
Most People Don’t Realize We Have No Idea How Our AIs Work
Thane Ruthenis
21 Dec 2023 20:02 UTC
151
points
42
comments
1
min read
LW
link
How Would an Utopia-Maximizer Look Like?
Thane Ruthenis
20 Dec 2023 20:01 UTC
31
points
23
comments
10
min read
LW
link
Don’t Share Information Exfohazardous on Others’ AI-Risk Models
Thane Ruthenis
19 Dec 2023 20:09 UTC
67
points
11
comments
1
min read
LW
link
The Shortest Path Between Scylla and Charybdis
Thane Ruthenis
18 Dec 2023 20:08 UTC
50
points
8
comments
5
min read
LW
link
A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans
Thane Ruthenis
17 Dec 2023 20:28 UTC
29
points
7
comments
11
min read
LW
link
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
Thane Ruthenis
16 Dec 2023 20:08 UTC
179
points
34
comments
5
min read
LW
link
Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis
15 Dec 2023 20:16 UTC
112
points
152
comments
8
min read
LW
link
Hands-On Experience Is Not Magic
Thane Ruthenis
27 May 2023 16:57 UTC
21
points
14
comments
5
min read
LW
link
A Case for the Least Forgiving Take On Alignment
Thane Ruthenis
2 May 2023 21:34 UTC
99
points
82
comments
22
min read
LW
link
World-Model Interpretability Is All We Need
Thane Ruthenis
14 Jan 2023 19:37 UTC
29
points
22
comments
21
min read
LW
link
Internal Interfaces Are a High-Priority Interpretability Target
Thane Ruthenis
29 Dec 2022 17:49 UTC
26
points
6
comments
7
min read
LW
link
In Defense of Wrapper-Minds
Thane Ruthenis
28 Dec 2022 18:28 UTC
23
points
38
comments
3
min read
LW
link
Accurate Models of AI Risk Are Hyperexistential Exfohazards
Thane Ruthenis
25 Dec 2022 16:50 UTC
30
points
38
comments
9
min read
LW
link
Corrigibility Via Thought-Process Deference
Thane Ruthenis
24 Nov 2022 17:06 UTC
17
points
5
comments
9
min read
LW
link
Value Formation: An Overarching Model
Thane Ruthenis
15 Nov 2022 17:16 UTC
34
points
20
comments
34
min read
LW
link
Greed Is the Root of This Evil
Thane Ruthenis
13 Oct 2022 20:40 UTC
18
points
7
comments
8
min read
LW
link
Are Generative World Models a Mesa-Optimization Risk?
Thane Ruthenis
29 Aug 2022 18:37 UTC
13
points
2
comments
3
min read
LW
link
AI Risk in Terms of Unstable Nuclear Software
Thane Ruthenis
26 Aug 2022 18:49 UTC
30
points
1
comment
6
min read
LW
link
Back to top
Next