I finished reading all the conversations a few hours ago. I have no follow-up questions (except maybe “now what?”), I’m still updating from all those words.
One except in particular, from the latest post, jumped at me (from Eliezer Yudkowsky, emphasis mine):
This is not aimed particularly at you, but I hope the reader may understand something of why Eliezer Yudkowsky goes about sounding so gloomy all the time about other people’s prospects for noticing what will kill them, by themselves, without Eliezer constantly hovering over their shoulder every minute prompting them with almost all of the answer.
The past years or reading about alignment have left me with an intense initial distrust of any alignment research agenda. Maybe it’s ordinary paranoia, maybe something more. I’ve not come up with any new ideas myself, and I’m not particularly confident in my ability to find flaws in someone else’s proposal (what if I’m not smart enough to understand them properly? What if I make things even more confused and waste everyone’s time?)
After thousands and thousands of lengthy conversations where it takes everyone ages to understand where threat models disagree, why some avenue of research is promising or not, and what is behind words (there was a whimper in my mind when the meaning/usage of corrigibility was discussed, as if this whole time experts had been talking past each other)...
… after all that, I get this strong urge to create something like Arbital to explain everything. Or maybe something simpler like Stampy. I don’t know if it would help much, the confusion is just very frustrating. When I’m facilitating discussions, trying to bring more people into the field, I insist on how not-settled many posts are, the kind of failure modes you have to watch out for.
Also this gives me an extra push to try harder, publish more things, ask more questions, because I’m getting more desperate to make progress. So, thank you for publishing this sequence.
I finished reading all the conversations a few hours ago. I have no follow-up questions (except maybe “now what?”), I’m still updating from all those words.
One except in particular, from the latest post, jumped at me (from Eliezer Yudkowsky, emphasis mine):
The past years or reading about alignment have left me with an intense initial distrust of any alignment research agenda. Maybe it’s ordinary paranoia, maybe something more. I’ve not come up with any new ideas myself, and I’m not particularly confident in my ability to find flaws in someone else’s proposal (what if I’m not smart enough to understand them properly? What if I make things even more confused and waste everyone’s time?)
After thousands and thousands of lengthy conversations where it takes everyone ages to understand where threat models disagree, why some avenue of research is promising or not, and what is behind words (there was a whimper in my mind when the meaning/usage of corrigibility was discussed, as if this whole time experts had been talking past each other)...
… after all that, I get this strong urge to create something like Arbital to explain everything. Or maybe something simpler like Stampy. I don’t know if it would help much, the confusion is just very frustrating. When I’m facilitating discussions, trying to bring more people into the field, I insist on how not-settled many posts are, the kind of failure modes you have to watch out for.
Also this gives me an extra push to try harder, publish more things, ask more questions, because I’m getting more desperate to make progress. So, thank you for publishing this sequence.