You put solid work into understanding central ideas in alignment, and published 4 posts simultaneously (1, 2, 3) communicating some key intuitions you’ve picked up. Doing this communicative work is really valuable and something I want to celebrate.
The posts helped me develop clear intuitions about corrigibility, approval-directed agents, and agent-foundations.
The posts are short, concrete, and easy to understand, which is the opposite of basically all the rest of the writing about AGI alignment.
My biggest hesitation(s) with curating this post:
I am somewhat hesitant to share simple intuition pumps about important topics, in case those intuition pumps are misleading.
I had a real hard time which of the four to pick, and have gone with the one that I thought people would find the clearest to read.
I was really excited that you wrote these posts, and learned a lot from them (plus the ensuing discussion in the comments).
I am somewhat hesitant to share simple intuition pumps about important topics, in case those intuition pumps are misleading.
This sounds wrong to me. Do you expect considering such things freely to be misleading on net? I expect some intuition pumps to be misleading, but for considering all of the intuitions that we can find about a situation to be better than avoiding them.
I feel like there are often big simplifications of complex ideas that just convey the wrong thing, and I was vaguely worried that in a field primarily dominated by things that are hard-to-read, things that are easy to understand will dominate the conversation even if they’re pretty misguided. It’s not a big worry for me here, but it was the biggest hesitation I had.
Not sure what Ben meant, but my own take is “sharing is fine, but intuition pumps without rigor backing them are not something we should curate regularly as an exemplar of what LW is trying to be”
I’ve curated this post for these reasons:
You put solid work into understanding central ideas in alignment, and published 4 posts simultaneously (1, 2, 3) communicating some key intuitions you’ve picked up. Doing this communicative work is really valuable and something I want to celebrate.
The posts helped me develop clear intuitions about corrigibility, approval-directed agents, and agent-foundations.
The posts are short, concrete, and easy to understand, which is the opposite of basically all the rest of the writing about AGI alignment.
My biggest hesitation(s) with curating this post:
I am somewhat hesitant to share simple intuition pumps about important topics, in case those intuition pumps are misleading.
I had a real hard time which of the four to pick, and have gone with the one that I thought people would find the clearest to read.
I was really excited that you wrote these posts, and learned a lot from them (plus the ensuing discussion in the comments).
This sounds wrong to me. Do you expect considering such things freely to be misleading on net? I expect some intuition pumps to be misleading, but for considering all of the intuitions that we can find about a situation to be better than avoiding them.
I feel like there are often big simplifications of complex ideas that just convey the wrong thing, and I was vaguely worried that in a field primarily dominated by things that are hard-to-read, things that are easy to understand will dominate the conversation even if they’re pretty misguided. It’s not a big worry for me here, but it was the biggest hesitation I had.
Not sure what Ben meant, but my own take is “sharing is fine, but intuition pumps without rigor backing them are not something we should curate regularly as an exemplar of what LW is trying to be”
Thanks a lot Ben! =D
On that note, Paul has recently written a blog post clarifying that his notion of “misaligned AI” does not coincide with what I wrote about here.