RSS

Thane Ruthenis

Karma: 7,527

In­ter­nal In­ter­faces Are a High-Pri­or­ity In­ter­pretabil­ity Target

Thane RuthenisDec 29, 2022, 5:49 PM
26 points
6 comments7 min readLW link

In Defense of Wrap­per-Minds

Thane RuthenisDec 28, 2022, 6:28 PM
24 points
38 comments3 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane RuthenisDec 25, 2022, 4:50 PM
33 points
38 comments9 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane RuthenisNov 24, 2022, 5:06 PM
18 points
5 comments9 min readLW link

Value For­ma­tion: An Over­ar­ch­ing Model

Thane RuthenisNov 15, 2022, 5:16 PM
34 points
20 comments34 min readLW link

Greed Is the Root of This Evil

Thane RuthenisOct 13, 2022, 8:40 PM
21 points
7 comments8 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane RuthenisAug 29, 2022, 6:37 PM
14 points
2 comments3 min readLW link

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane RuthenisAug 26, 2022, 6:49 PM
30 points
1 comment6 min readLW link

Broad Pic­ture of Hu­man Values

Thane RuthenisAug 20, 2022, 7:42 PM
42 points
6 comments10 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane RuthenisAug 17, 2022, 6:47 PM
42 points
14 comments1 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane RuthenisAug 4, 2022, 11:31 PM
38 points
1 comment13 min readLW link

What En­vi­ron­ment Prop­er­ties Select Agents For World-Model­ing?

Thane RuthenisJul 23, 2022, 7:27 PM
25 points
1 comment12 min readLW link

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane RuthenisJul 13, 2022, 8:23 PM
43 points
16 comments4 min readLW link

Refram­ing the AI Risk

Thane RuthenisJul 1, 2022, 6:44 PM
26 points
7 comments6 min readLW link

Is This Thing Sen­tient, Y/​N?

Thane RuthenisJun 20, 2022, 6:37 PM
4 points
10 comments7 min readLW link

The Unified The­ory of Nor­ma­tive Ethics

Thane RuthenisJun 17, 2022, 7:55 PM
8 points
0 comments6 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane RuthenisJun 16, 2022, 10:00 PM
25 points
4 comments18 min readLW link

Poorly-Aimed Death Rays

Thane RuthenisJun 11, 2022, 6:29 PM
48 points
5 comments4 min readLW link

Re­shap­ing the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM
147 points
35 comments21 min readLW link

Agency As a Nat­u­ral Abstraction

Thane RuthenisMay 13, 2022, 6:02 PM
55 points
9 comments13 min readLW link