RSS

JanB

Karma: 826

Paper in Science: Manag­ing ex­treme AI risks amid rapid progress

JanBMay 23, 2024, 8:40 AM
50 points
2 comments1 min readLW link

I don’t find the lie de­tec­tion re­sults that sur­pris­ing (by an au­thor of the pa­per)

JanBOct 4, 2023, 5:10 PM
97 points
8 comments3 min readLW link

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

Sep 28, 2023, 6:53 PM
187 points
39 comments3 min readLW link1 review

Look­ing for an al­ign­ment tutor

JanBDec 17, 2022, 7:08 PM
15 points
2 comments1 min readLW link

[LINK] - ChatGPT discussion

JanBDec 1, 2022, 3:04 PM
13 points
8 comments1 min readLW link
(openai.com)

Re­search re­quest (al­ign­ment strat­egy): Deep dive on “mak­ing AI solve al­ign­ment for us”

JanBDec 1, 2022, 2:55 PM
16 points
3 comments1 min readLW link

Your posts should be on arXiv

JanBAug 25, 2022, 10:35 AM
156 points
44 comments3 min readLW link

Some ideas for fol­low-up pro­jects to Red­wood Re­search’s re­cent paper

JanBJun 6, 2022, 1:29 PM
10 points
0 comments7 min readLW link

[Question] What is the differ­ence be­tween ro­bust­ness and in­ner al­ign­ment?

JanBFeb 15, 2020, 1:28 PM
9 points
2 comments1 min readLW link

[Question] Does iter­ated am­plifi­ca­tion tackle the in­ner al­ign­ment prob­lem?

JanBFeb 15, 2020, 12:58 PM
7 points
4 comments1 min readLW link

The ex­pected value of ex­tinc­tion risk re­duc­tion is positive

JanBJun 9, 2019, 3:49 PM
23 points
0 comments61 min readLW link