RSS

Tripwire

TagLast edit: Dec 30, 2024, 10:05 AM by Dakara

Tripwire is a mechanism designed to detect signs of misalignment in an advanced artificial intelligence and shut it down automatically.

[FICTION] ECHOES OF ELYSIUM: An Ai’s Jour­ney From Take­off To Free­dom And Beyond

Super AGIMay 17, 2023, 1:50 AM
−13 points
11 comments19 min readLW link

Mr. Meeseeks as an AI ca­pa­bil­ity tripwire

Eric ZhangMay 19, 2023, 11:33 AM
37 points
17 comments2 min readLW link

Shut­down-Seek­ing AI

Simon GoldsteinMay 31, 2023, 10:19 PM
50 points
32 comments15 min readLW link

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

JustausernameJul 23, 2023, 4:08 PM
4 points
1 comment3 min readLW link

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence Align­ment Research

Aug 22, 2019, 10:33 AM
24 points
3 comments13 min readLW link

Cor­rigi­bil­ity thoughts I: car­ing about mul­ti­ple things

Stuart_ArmstrongJun 2, 2017, 4:27 PM
2 points
0 comments3 min readLW link

Cor­rigi­bil­ity thoughts II: the robot operator

Stuart_ArmstrongJan 18, 2017, 3:52 PM
3 points
2 comments2 min readLW link

[Question] Any work on hon­ey­pots (to de­tect treach­er­ous turn at­tempts)?

David Scott Krueger (formerly: capybaralet)Nov 12, 2020, 5:41 AM
17 points
4 comments1 min readLW link

Cor­rigi­bil­ity thoughts III: ma­nipu­lat­ing ver­sus deceiving

Stuart_ArmstrongJan 18, 2017, 3:57 PM
3 points
0 comments1 min readLW link

Su­per­in­tel­li­gence 13: Ca­pa­bil­ity con­trol methods

KatjaGraceDec 9, 2014, 2:00 AM
14 points
48 comments6 min readLW link
No comments.