RSS

David Johnston

Karma: 510

A brief the­ory of why we think things are good or bad

David JohnstonOct 20, 2024, 8:31 PM
7 points
10 commentsLW link

Mechanis­tic Ano­maly De­tec­tion Re­search Update

Aug 6, 2024, 10:33 AM
11 points
0 comments1 min readLW link
(blog.eleuther.ai)

Opinion merg­ing for AI control

David JohnstonMay 4, 2023, 2:43 AM
6 points
0 comments11 min readLW link

[Question] Is it worth avoid­ing de­tailed dis­cus­sions of ex­pec­ta­tions about agency lev­els of pow­er­ful AIs?

David JohnstonMar 16, 2023, 3:06 AM
11 points
6 comments2 min readLW link

How likely are ma­lign pri­ors over ob­jec­tives? [aborted WIP]

David JohnstonNov 11, 2022, 5:36 AM
−1 points
0 comments8 min readLW link

When can a mimic sur­prise you? Why gen­er­a­tive mod­els han­dle seem­ingly ill-posed problems

David JohnstonNov 5, 2022, 1:19 PM
8 points
4 comments16 min readLW link

There’s prob­a­bly a trade­off be­tween AI ca­pa­bil­ity and safety, and we should act like it

David JohnstonJun 9, 2022, 12:17 AM
3 points
3 comments1 min readLW link

Is evolu­tion­ary in­fluence the mesa ob­jec­tive that we’re in­ter­ested in?

David JohnstonMay 3, 2022, 1:18 AM
3 points
2 comments5 min readLW link

[Cross-post] Half baked ideas: defin­ing and mea­sur­ing Ar­tifi­cial In­tel­li­gence sys­tem effectiveness

David JohnstonApr 5, 2022, 12:29 AM
2 points
0 comments7 min readLW link

[Question] Are there any im­pos­si­bil­ity the­o­rems for strong and safe AI?

David JohnstonMar 11, 2022, 1:41 AM
5 points
3 comments1 min readLW link

Coun­ter­fac­tu­als from en­sem­bles of peers

David JohnstonJan 4, 2022, 7:01 AM
3 points
4 comments7 min readLW link