RSS

Thane Ruthenis

Karma: 4,135

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC
20 points
2 comments33 min readLW link
(aiimpacts.org)

Thane Ruthe­nis’s Shortform

Thane Ruthenis13 Sep 2024 20:52 UTC
7 points
4 comments1 min readLW link

A Crisper Ex­pla­na­tion of Si­mu­lacrum Levels

Thane Ruthenis23 Dec 2023 22:13 UTC
86 points
13 comments13 min readLW link

Ideal­ized Agents Are Ap­prox­i­mate Causal Mir­rors (+ Rad­i­cal Op­ti­mism on Agent Foun­da­tions)

Thane Ruthenis22 Dec 2023 20:19 UTC
74 points
14 comments6 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane Ruthenis21 Dec 2023 20:02 UTC
158 points
42 comments1 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
31 points
23 comments10 min readLW link

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC
67 points
11 comments1 min readLW link

The Short­est Path Between Scylla and Charybdis

Thane Ruthenis18 Dec 2023 20:08 UTC
50 points
8 comments5 min readLW link

A Com­mon-Sense Case For Mu­tu­ally-Misal­igned AGIs Ally­ing Against Humans

Thane Ruthenis17 Dec 2023 20:28 UTC
29 points
7 comments11 min readLW link

“Hu­man­ity vs. AGI” Will Never Look Like “Hu­man­ity vs. AGI” to Humanity

Thane Ruthenis16 Dec 2023 20:08 UTC
180 points
34 comments5 min readLW link

Cur­rent AIs Provide Nearly No Data Rele­vant to AGI Alignment

Thane Ruthenis15 Dec 2023 20:16 UTC
114 points
155 comments8 min readLW link

Hands-On Ex­pe­rience Is Not Magic

Thane Ruthenis27 May 2023 16:57 UTC
21 points
14 comments5 min readLW link

A Case for the Least For­giv­ing Take On Alignment

Thane Ruthenis2 May 2023 21:34 UTC
100 points
84 comments22 min readLW link

World-Model In­ter­pretabil­ity Is All We Need

Thane Ruthenis14 Jan 2023 19:37 UTC
35 points
22 comments21 min readLW link

In­ter­nal In­ter­faces Are a High-Pri­or­ity In­ter­pretabil­ity Target

Thane Ruthenis29 Dec 2022 17:49 UTC
26 points
6 comments7 min readLW link

In Defense of Wrap­per-Minds

Thane Ruthenis28 Dec 2022 18:28 UTC
24 points
38 comments3 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC
31 points
38 comments9 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane Ruthenis24 Nov 2022 17:06 UTC
17 points
5 comments9 min readLW link

Value For­ma­tion: An Over­ar­ch­ing Model

Thane Ruthenis15 Nov 2022 17:16 UTC
34 points
20 comments34 min readLW link

Greed Is the Root of This Evil

Thane Ruthenis13 Oct 2022 20:40 UTC
18 points
7 comments8 min readLW link