RSS

David Johnston

Karma: 492

A brief the­ory of why we think things are good or bad

David Johnston20 Oct 2024 20:31 UTC
7 points
10 comments1 min readLW link

Mechanis­tic Ano­maly De­tec­tion Re­search Update

6 Aug 2024 10:33 UTC
11 points
0 comments1 min readLW link
(blog.eleuther.ai)

Opinion merg­ing for AI control

David Johnston4 May 2023 2:43 UTC
6 points
0 comments11 min readLW link

[Question] Is it worth avoid­ing de­tailed dis­cus­sions of ex­pec­ta­tions about agency lev­els of pow­er­ful AIs?

David Johnston16 Mar 2023 3:06 UTC
11 points
6 comments2 min readLW link

How likely are ma­lign pri­ors over ob­jec­tives? [aborted WIP]

David Johnston11 Nov 2022 5:36 UTC
−1 points
0 comments8 min readLW link

When can a mimic sur­prise you? Why gen­er­a­tive mod­els han­dle seem­ingly ill-posed problems

David Johnston5 Nov 2022 13:19 UTC
8 points
4 comments16 min readLW link

There’s prob­a­bly a trade­off be­tween AI ca­pa­bil­ity and safety, and we should act like it

David Johnston9 Jun 2022 0:17 UTC
3 points
3 comments1 min readLW link

Is evolu­tion­ary in­fluence the mesa ob­jec­tive that we’re in­ter­ested in?

David Johnston3 May 2022 1:18 UTC
3 points
2 comments5 min readLW link

[Cross-post] Half baked ideas: defin­ing and mea­sur­ing Ar­tifi­cial In­tel­li­gence sys­tem effectiveness

David Johnston5 Apr 2022 0:29 UTC
2 points
0 comments7 min readLW link

[Question] Are there any im­pos­si­bil­ity the­o­rems for strong and safe AI?

David Johnston11 Mar 2022 1:41 UTC
5 points
3 comments1 min readLW link

Coun­ter­fac­tu­als from en­sem­bles of peers

David Johnston4 Jan 2022 7:01 UTC
3 points
4 comments7 min readLW link