I’m really feeling this comment thread lately. It feels like there is selective rationalism going on, many dissenting voices have given up on posting, and plenty of bad arguments are getting signal boosted, repeatedly. There is some unrealistic contradictory world model most people here have that will get almost every policy approach taken to utterly fail, as they have in the recent past. I largely describe the flawed world model as not appreciating the game theory dynamics and ignoring any evidence that makes certain policy approaches impossible.
(Funny enough its traits remind me of an unaligned AI, since the world model almost seems to have developed a survival drive)
IMO the next level-up in discourse is going to be when someone creates an LLM-moderated forum. The LLM will have a big public list of discussion guidelines in its context window. When you click “submit” on your comment, it will give your comment a provisional score (in lieu of votescore), and tell you what you can do to improve it. The LLM won’t just tell you how to be more civil or rational. It will also say things like “hey, it looks like someone else already made that point in the comments—shall I upvote their comment for you, and extract the original portion of your comment as a new submission?” Or “back in 2013 it was argued that XYZ, seems your comment doesn’t jive with that. Thoughts?” Or “Point A was especially insightful, I like that!” Or “here’s a way you could rewrite this more briefly and more clearly and less aggressively”. Or “here’s a counterargument someone might write, perhaps you should be anticipating it?”
The initial version probably won’t work well, but over time, with enough discussion and iteration on guidelines/finetuning/etc., the discussion on that forum will be clearly superior. It’ll be the same sort of level-up we saw with Community Notes on X, or with the US court system compared with the mob rule you see on social media. Real-world humans have the problem where the more you feel you have a dog in the fight, the more you engage with the discussion, and that causes inevitable politicization with online voting systems. The LLM is going to be like a superhumanly patient neutral moderator, neutering the popularity contest and ingroup/outgroup aspects of modern social media.
Out of curiosity, does that mean that if the app worked fairly well as described, you would consider that an update that alignment maybe isn’t as hard as you thought? Or are you one of the “only endpoints can be predicted” crowd, such that this wouldn’t constitute any evidence?
BTW, I strongly suspect that Youtube cleaned up its comment section in recent years by using ML for comment rankings. Seems like a big improvement to me. You’ll notice that “crappy Youtube comments” is not as much of a meme as it once was.
I mean, I think I’m one of the people you disagree with a lot, but I think there’s something about the design of the upvote system that makes it quickly feel like an intense rejection if people disagree a bit, and so new folks quickly nope out. The people who stay are the ones who either can get upvoted consistently, or who are impervious to the emotional impact of being downvoted.
I’m really feeling this comment thread lately. It feels like there is selective rationalism going on, many dissenting voices have given up on posting, and plenty of bad arguments are getting signal boosted, repeatedly. There is some unrealistic contradictory world model most people here have that will get almost every policy approach taken to utterly fail, as they have in the recent past. I largely describe the flawed world model as not appreciating the game theory dynamics and ignoring any evidence that makes certain policy approaches impossible.
(Funny enough its traits remind me of an unaligned AI, since the world model almost seems to have developed a survival drive)
IMO the next level-up in discourse is going to be when someone creates an LLM-moderated forum. The LLM will have a big public list of discussion guidelines in its context window. When you click “submit” on your comment, it will give your comment a provisional score (in lieu of votescore), and tell you what you can do to improve it. The LLM won’t just tell you how to be more civil or rational. It will also say things like “hey, it looks like someone else already made that point in the comments—shall I upvote their comment for you, and extract the original portion of your comment as a new submission?” Or “back in 2013 it was argued that XYZ, seems your comment doesn’t jive with that. Thoughts?” Or “Point A was especially insightful, I like that!” Or “here’s a way you could rewrite this more briefly and more clearly and less aggressively”. Or “here’s a counterargument someone might write, perhaps you should be anticipating it?”
The initial version probably won’t work well, but over time, with enough discussion and iteration on guidelines/finetuning/etc., the discussion on that forum will be clearly superior. It’ll be the same sort of level-up we saw with Community Notes on X, or with the US court system compared with the mob rule you see on social media. Real-world humans have the problem where the more you feel you have a dog in the fight, the more you engage with the discussion, and that causes inevitable politicization with online voting systems. The LLM is going to be like a superhumanly patient neutral moderator, neutering the popularity contest and ingroup/outgroup aspects of modern social media.
It sounds like an excellent lab of all possbile alignment failures.
Out of curiosity, does that mean that if the app worked fairly well as described, you would consider that an update that alignment maybe isn’t as hard as you thought? Or are you one of the “only endpoints can be predicted” crowd, such that this wouldn’t constitute any evidence?
BTW, I strongly suspect that Youtube cleaned up its comment section in recent years by using ML for comment rankings. Seems like a big improvement to me. You’ll notice that “crappy Youtube comments” is not as much of a meme as it once was.
I mean, I think I’m one of the people you disagree with a lot, but I think there’s something about the design of the upvote system that makes it quickly feel like an intense rejection if people disagree a bit, and so new folks quickly nope out. The people who stay are the ones who either can get upvoted consistently, or who are impervious to the emotional impact of being downvoted.