Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
JanB
Karma:
813
All
Posts
Comments
New
Top
Old
Paper in Science: Managing extreme AI risks amid rapid progress
JanB
23 May 2024 8:40 UTC
50
points
2
comments
1
min read
LW
link
I don’t find the lie detection results that surprising (by an author of the paper)
JanB
4 Oct 2023 17:10 UTC
97
points
8
comments
3
min read
LW
link
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB
,
Owain_Evans
and
SoerenMind
28 Sep 2023 18:53 UTC
185
points
38
comments
3
min read
LW
link
Looking for an alignment tutor
JanB
17 Dec 2022 19:08 UTC
15
points
2
comments
1
min read
LW
link
[LINK] - ChatGPT discussion
JanB
1 Dec 2022 15:04 UTC
13
points
8
comments
1
min read
LW
link
(openai.com)
Research request (alignment strategy): Deep dive on “making AI solve alignment for us”
JanB
1 Dec 2022 14:55 UTC
16
points
3
comments
1
min read
LW
link
Your posts should be on arXiv
JanB
25 Aug 2022 10:35 UTC
155
points
44
comments
3
min read
LW
link
Some ideas for follow-up projects to Redwood Research’s recent paper
JanB
6 Jun 2022 13:29 UTC
10
points
0
comments
7
min read
LW
link
[Question]
What is the difference between robustness and inner alignment?
JanB
15 Feb 2020 13:28 UTC
9
points
2
comments
1
min read
LW
link
[Question]
Does iterated amplification tackle the inner alignment problem?
JanB
15 Feb 2020 12:58 UTC
7
points
4
comments
1
min read
LW
link
The expected value of extinction risk reduction is positive
JanB
9 Jun 2019 15:49 UTC
23
points
0
comments
61
min read
LW
link
Back to top