Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
David Johnston
Karma:
492
All
Posts
Comments
New
Top
Old
A brief theory of why we think things are good or bad
David Johnston
20 Oct 2024 20:31 UTC
7
points
10
comments
1
min read
LW
link
Mechanistic Anomaly Detection Research Update
Nora Belrose
and
David Johnston
6 Aug 2024 10:33 UTC
11
points
0
comments
1
min read
LW
link
(blog.eleuther.ai)
Opinion merging for AI control
David Johnston
4 May 2023 2:43 UTC
6
points
0
comments
11
min read
LW
link
[Question]
Is it worth avoiding detailed discussions of expectations about agency levels of powerful AIs?
David Johnston
16 Mar 2023 3:06 UTC
11
points
6
comments
2
min read
LW
link
How likely are malign priors over objectives? [aborted WIP]
David Johnston
11 Nov 2022 5:36 UTC
−1
points
0
comments
8
min read
LW
link
When can a mimic surprise you? Why generative models handle seemingly ill-posed problems
David Johnston
5 Nov 2022 13:19 UTC
8
points
4
comments
16
min read
LW
link
There’s probably a tradeoff between AI capability and safety, and we should act like it
David Johnston
9 Jun 2022 0:17 UTC
3
points
3
comments
1
min read
LW
link
Is evolutionary influence the mesa objective that we’re interested in?
David Johnston
3 May 2022 1:18 UTC
3
points
2
comments
5
min read
LW
link
[Cross-post] Half baked ideas: defining and measuring Artificial Intelligence system effectiveness
David Johnston
5 Apr 2022 0:29 UTC
2
points
0
comments
7
min read
LW
link
[Question]
Are there any impossibility theorems for strong and safe AI?
David Johnston
11 Mar 2022 1:41 UTC
5
points
3
comments
1
min read
LW
link
Counterfactuals from ensembles of peers
David Johnston
4 Jan 2022 7:01 UTC
3
points
4
comments
7
min read
LW
link
Back to top