You don’t want to reverse-sort, though. While that would prioritize the ‘update’ comments like “this failed to replicate” or “this was the foundation of a whole new ML research area”, it would make a hash of the older comments, which will be nonsensical as you are reading the reply to a reply to a reply … to a reply to the post. (Even if the old posts had tree comments, rather than forcibly being linearized for lack of threading at that time, reverse-sorting the top-level comments would be weird and confusing.)
You could try to use a LLM to classify comments (or post + comment) by ‘updateness’, and simply put ‘update’ comments in a separate block of newest-first comments at the start (while sorting the rest in the usual oldest-first way), and that might work. (The separator could be an explicit section header/bold label, or just a horizontal ruler with the difference left implicit.) Include a few dozen examples to few-shot it by looking at new comments on old posts—I bet a nice cheap LLM like GPT-4o or Claude-3.5-sonnet can handle it without a problem. Standard ‘I know it when I see it’ semantic property that few-shot LLMs work well for.
A simpler thing might be to say that what Randall identifies is a natural kind: there are ‘old’ comments and there are ‘new’ comments, and they are fundamentally different. A heuristic might be: Comments on a post within a year of posting are ‘old’, and comments after that are ‘new’; as before, ‘new’ comments get put in a separate section sort-by-newest at the beginning, then ‘old’ comments get sort-by-oldest. If that heuristic doesn’t look good, you could look for a cutpoint: the largest temporal gap between 2 successive top-level comments. Then everything after that is ‘new’ vs ‘old’. (Because usually, if there is a burst of comments on posting, and then someone revisits it long afterwards to update it, the largest gap will be somewhere in the ‘new’ subsequence, so this will be conservative in creating out-of-order comments and show only the newest.)
I definitely don’t want newest first. The magic (new and upvoted) sort seems to work well. The default (top scoring) sort too. A concrete example is this post on “an especially elegant evpsych experiment”. It’s not the newest, but it is top-scoring.
I think an AI could reasonably convert ancient discussions to light use of threads to preserve most of the conversation flow, where it has value.
You don’t want to reverse-sort, though. While that would prioritize the ‘update’ comments like “this failed to replicate” or “this was the foundation of a whole new ML research area”, it would make a hash of the older comments, which will be nonsensical as you are reading the reply to a reply to a reply … to a reply to the post. (Even if the old posts had tree comments, rather than forcibly being linearized for lack of threading at that time, reverse-sorting the top-level comments would be weird and confusing.)
You could try to use a LLM to classify comments (or post + comment) by ‘updateness’, and simply put ‘update’ comments in a separate block of newest-first comments at the start (while sorting the rest in the usual oldest-first way), and that might work. (The separator could be an explicit section header/bold label, or just a horizontal ruler with the difference left implicit.) Include a few dozen examples to few-shot it by looking at new comments on old posts—I bet a nice cheap LLM like GPT-4o or Claude-3.5-sonnet can handle it without a problem. Standard ‘I know it when I see it’ semantic property that few-shot LLMs work well for.
A simpler thing might be to say that what Randall identifies is a natural kind: there are ‘old’ comments and there are ‘new’ comments, and they are fundamentally different. A heuristic might be: Comments on a post within a year of posting are ‘old’, and comments after that are ‘new’; as before, ‘new’ comments get put in a separate section sort-by-newest at the beginning, then ‘old’ comments get sort-by-oldest. If that heuristic doesn’t look good, you could look for a cutpoint: the largest temporal gap between 2 successive top-level comments. Then everything after that is ‘new’ vs ‘old’. (Because usually, if there is a burst of comments on posting, and then someone revisits it long afterwards to update it, the largest gap will be somewhere in the ‘new’ subsequence, so this will be conservative in creating out-of-order comments and show only the newest.)
Cheers, yeah, something like this might be the way to go.
I definitely don’t want newest first. The magic (new and upvoted) sort seems to work well. The default (top scoring) sort too. A concrete example is this post on “an especially elegant evpsych experiment”. It’s not the newest, but it is top-scoring.
I think an AI could reasonably convert ancient discussions to light use of threads to preserve most of the conversation flow, where it has value.