Idea for LLM support for writing LessWrong posts: virtual comments.
Back in August I discussed with Rafe & Oliver a bit about how to integrate LLMs into LW2 in ways which aren’t awful and which encourage improvement—particularly using the new ‘prompt caching’ feature. To summarize one idea: we can use long-context LLMs with prompt caching to try to simulate various LW users of diverse perspectives to write useful feedback on drafts for authors.
(Prompt caching (eg) is the Transformer version of the old RNN hidden-state caching trick, where you run an input through the (deterministic) NN, and then save the intermediate version, and apply that to arbitrarily many future inputs, to avoid recomputing the first input each time, which is the naive way to do it. You can think of it as a lightweight finetuning. This is particularly useful if you are thinking about having large generic prompts—such as an entire corpus. A context window of millions of tokens might take up to a minute & $1 to compute currently, so you definitely need to be careful and don’t want to compute more than once.)
One idea would be to try to use LLMs to offer feedback on drafts or articles. Given that tuned LLM feedback from Claude or ChatGPT is still not that great, tending towards sycophancy or obviousness or ChatGPTese, it is hardly worthwhile running a post through a generic “criticize this essay” prompt. (If anyone on LW2 wanted to do such a thing, they are surely capable of doing it themselves, and integrating it into LW2 isn’t that useful. Removing the friction might be helpful, but it doesn’t seem like it would move any needles.)
So, one way to force out more interesting feedback would be to try to force LLMs out of the chatbot assistant mode-collapse, and into more interesting simulations for feedback.
There has been some success with just suggestively-named personas or characters in dialogues (you could imagine here we’d have “Skeptic” or “Optimist” characters), but we can do better.
Since this is for LW2, we have an obvious solution: simulate LW users! We know that LW is in the training corpus of almost all LLMs and that writers on it (like myself) are well-known to LLMs (eg. truesight).
So we can ask for feedback from simulated LWers: eg. Eliezer Yudkowsky or myself or Paul Christiano or the author or...
This could be done nicely by finetuning a “LW LLM” on all the articles & comments, with associated metadata like karma, and then feeding in any new draft or article into it, and sampling a comment from each persona. (This helps instill a lot of useful domain knowledge, but also, perhaps more importantly, helps override the mode-collapse and non-judgmentalness of assistant LLMs. Perhaps the virtual-gwern will not be as acerbic or disagreeable as the original, but we’ll take what we can get at this point...) If there is some obvious criticism or comment Eliezer Yudkowsky would make on a post, which even a LLM can predict, why not deal with it upfront instead of waiting for the real Eliezer to comment (which is also unlikely to ever happen these days)?
And one can of course sample an entire comment tree of responses to a ‘virtual comment’, with the LLM predicting the logical respondents.
This can further incorporate the draft’s author’s full history, which will usually fit into a multi-million token context window. So their previous comments and discussions, full of relevant material, will get included. This prompt can be cached, and used to sample a bunch of comment-trees.
(And if finetuning is infeasible, one can try instead to put the LW corpus into the context and prompt-cache that before adding in the author’s corpus.)
The default prompt would be to prompt for high-karma responses.
This might not work, because it might be too hard to generate good high-quality responses blindly in a feedforward fashion, without any kind of search or filtering.
So the formatting of the data might be to put the metadata after a comment, for ranking purposes: so the LLM generates a response and only then a karma score, and then when we sample, we simply throw out predicted-low-score comments rather than waste the author’s time looking at them. (When it comes to these sorts of assistants, I strongly believe ‘quality > quantity’, and ‘silence is golden’. Better to waste some API bills than author time.)
One can also target comments to specific kinds of feedback, to structure it better than a grab-bag of whatever the LLM happens to sample.
It would be good to have (in descending order of how likely to be useful to the author) a ‘typo’ tree, a ‘copyediting’/‘style’/‘tone’ tree, ‘confusing part’, ‘terminology’, ‘related work’, ‘criticism’, ‘implications and extrapolations’, ‘abstract/summary’ (I know people hate writing those)… What else?
(These are not natural LW comments, but you can easily see how to prompt for them with prompts like “$USER $KARMA $DATE | Typo: ”, etc.)
As they are just standard LW comments, they can be attached to the post or draft like regular comments (is this possible? I’d think so, just transclude the comment-tree into the corresponding draft page) and responded to or voted on etc.
(Downvoted comments can be fed back into the finetuning with low karma to discourage feedback like that.)
Presumably at this point, it would not be hard to make it interactive, and allow the author to respond & argue with feedback. I don’t know how worthwhile this would be, and the more interaction there is, the harder it would be to hide the virtual comments after completion.
And when the author finishes writing & posts a draft, the virtual comments disappear (possibly entirely unread), having served their purpose as scaffolding to help improve the draft.
(If the author really likes one, they can just copy it in or quote it, I’d think, which ensures they know they take full responsibility for it and can’t blame the machine for any mistakes or confabulations or opinions. But otherwise, I don’t see any real reason to make them visible to readers of the final post. If included at all, they should prominently flagged—maybe the usernames are always prefixed by AI_$USER to ensure no one, including future LLMs, is confused—and definitely always sort to the bottom & be collapsed by default.)
Yeah the LW team has been doing this sort of thing internally, still in the experimental phase. I don’t know if we’ve used all the tricks listed here yet.
I think one underused trick for training LLMs is to explicitly “edit” them. That is, suppose they generate some text X in response to prompt Y, and it has some error or is missing something. In that case you can create a text X’ that fixes this problem, and do a gradient update to increase log P(X’|Y)/P(X|Y).
For example, if we generate virtual comments in the style of certain LW users, one could either let those users browse the virtual comments that have been created in their style and correct them, or one could let the people who receive the virtual comments edit them to remove misunderstanding or similar.
You could do that, but it adds on a lot of additional hassle and problems, and I don’t think this is a ‘big enough’ application to justify the overhead or elicit enough corrections.
If the comments are generated LW-side and only upvoted/downvoted by the author, that uses no additional functionality and has no security problems; if you let authors edit arbitrary comments by other users on their drafts, which would be new functionality (I’m not aware of any place where you get to edit other users’ comments), now you suddenly have a potential security vulnerability. You also now have to track and save those edited versions somewhere as part of your corpus to finetune with, as opposed to simply slurping out the public database like any other user like GreaterWrong.
And then who would do so? Authors would have to do a lot of work to edit comments to, perhaps, one day slightly improve the feedback on others’ posts—not too incentive-compatible, nor even clearly a good use of time compared to working on their own post. (After all, presumably if the comments are bad in an easily fixed way based on the current draft, then the LLM training on the final version of that draft should get a lot of the value of the edits.) It is hard enough work to simply write a post or read comments on it and edit the post based on it for the intended human readers; I can’t imagine having such a surplus of time & energy I’d be able to go and rewrite all of the virtual comments just for future specialized LW-only LLM finetunes.
I look forward to seeing some of this integrated into LW!
I spend a fair amount of time writing comments that are probably pretty obvious to experienced LWers. I think this is important for getting newer people on-board with historical ideas and arguments. This includes explaining to enthusiastic newbies why they’re being voted into oblivion. Giving them some simulated expert comments would be great.
I think we’d see a large improvement in how useful new user’s early posts are, and therefore how much they’re encouraged to help vs. turned away by downvotes because they’re not really contributing. It would be a great way to let new posters do some efficient due diligence without catching up on the entire distributed history of discussion on their chosen topic.
I think it would also be useful for the most knowledgeable authors to have an LLM with cached context/hidden state from the best LW posts and comment sections giving virtual comments before publication.
I’d love to see whatever you’ve got going internally as an optional feature (if it doesn’t cost too much) rather than wait for a finished, integrated feature.
At least in theory, the comments, particularly a ‘related work’ comment-tree, would do that already by talking about other LW articles as relevant. (All of which the LLM should know by heart due to the finetuning.)
Might not work out of the box, of course, in which case you could try to fix that. You could do a regular nearest-neighbors-style look up and just send that to the author as a comment (“here are the 20 most similar LW articles:”); or you could elaborate the virtual comments by adding a retrieval step and throwing into the prompt metadata about ‘similar’ articles as the draft, so the generated comments are much more likely to reference them.
Idea for LLM support for writing LessWrong posts: virtual comments.
Back in August I discussed with Rafe & Oliver a bit about how to integrate LLMs into LW2 in ways which aren’t awful and which encourage improvement—particularly using the new ‘prompt caching’ feature. To summarize one idea: we can use long-context LLMs with prompt caching to try to simulate various LW users of diverse perspectives to write useful feedback on drafts for authors.
(Prompt caching (eg) is the Transformer version of the old RNN hidden-state caching trick, where you run an input through the (deterministic) NN, and then save the intermediate version, and apply that to arbitrarily many future inputs, to avoid recomputing the first input each time, which is the naive way to do it. You can think of it as a lightweight finetuning. This is particularly useful if you are thinking about having large generic prompts—such as an entire corpus. A context window of millions of tokens might take up to a minute & $1 to compute currently, so you definitely need to be careful and don’t want to compute more than once.)
One idea would be to try to use LLMs to offer feedback on drafts or articles. Given that tuned LLM feedback from Claude or ChatGPT is still not that great, tending towards sycophancy or obviousness or ChatGPTese, it is hardly worthwhile running a post through a generic “criticize this essay” prompt. (If anyone on LW2 wanted to do such a thing, they are surely capable of doing it themselves, and integrating it into LW2 isn’t that useful. Removing the friction might be helpful, but it doesn’t seem like it would move any needles.)
So, one way to force out more interesting feedback would be to try to force LLMs out of the chatbot assistant mode-collapse, and into more interesting simulations for feedback. There has been some success with just suggestively-named personas or characters in dialogues (you could imagine here we’d have “Skeptic” or “Optimist” characters), but we can do better. Since this is for LW2, we have an obvious solution: simulate LW users! We know that LW is in the training corpus of almost all LLMs and that writers on it (like myself) are well-known to LLMs (eg. truesight). So we can ask for feedback from simulated LWers: eg. Eliezer Yudkowsky or myself or Paul Christiano or the author or...
This could be done nicely by finetuning a “LW LLM” on all the articles & comments, with associated metadata like karma, and then feeding in any new draft or article into it, and sampling a comment from each persona. (This helps instill a lot of useful domain knowledge, but also, perhaps more importantly, helps override the mode-collapse and non-judgmentalness of assistant LLMs. Perhaps the virtual-gwern will not be as acerbic or disagreeable as the original, but we’ll take what we can get at this point...) If there is some obvious criticism or comment Eliezer Yudkowsky would make on a post, which even a LLM can predict, why not deal with it upfront instead of waiting for the real Eliezer to comment (which is also unlikely to ever happen these days)? And one can of course sample an entire comment tree of responses to a ‘virtual comment’, with the LLM predicting the logical respondents.
This can further incorporate the draft’s author’s full history, which will usually fit into a multi-million token context window. So their previous comments and discussions, full of relevant material, will get included. This prompt can be cached, and used to sample a bunch of comment-trees. (And if finetuning is infeasible, one can try instead to put the LW corpus into the context and prompt-cache that before adding in the author’s corpus.)
The default prompt would be to prompt for high-karma responses. This might not work, because it might be too hard to generate good high-quality responses blindly in a feedforward fashion, without any kind of search or filtering. So the formatting of the data might be to put the metadata after a comment, for ranking purposes: so the LLM generates a response and only then a karma score, and then when we sample, we simply throw out predicted-low-score comments rather than waste the author’s time looking at them. (When it comes to these sorts of assistants, I strongly believe ‘quality > quantity’, and ‘silence is golden’. Better to waste some API bills than author time.)
One can also target comments to specific kinds of feedback, to structure it better than a grab-bag of whatever the LLM happens to sample. It would be good to have (in descending order of how likely to be useful to the author) a ‘typo’ tree, a ‘copyediting’/‘style’/‘tone’ tree, ‘confusing part’, ‘terminology’, ‘related work’, ‘criticism’, ‘implications and extrapolations’, ‘abstract/summary’ (I know people hate writing those)… What else? (These are not natural LW comments, but you can easily see how to prompt for them with prompts like “$USER $KARMA $DATE | Typo: ”, etc.)
As they are just standard LW comments, they can be attached to the post or draft like regular comments (is this possible? I’d think so, just transclude the comment-tree into the corresponding draft page) and responded to or voted on etc. (Downvoted comments can be fed back into the finetuning with low karma to discourage feedback like that.) Presumably at this point, it would not be hard to make it interactive, and allow the author to respond & argue with feedback. I don’t know how worthwhile this would be, and the more interaction there is, the harder it would be to hide the virtual comments after completion.
And when the author finishes writing & posts a draft, the virtual comments disappear (possibly entirely unread), having served their purpose as scaffolding to help improve the draft. (If the author really likes one, they can just copy it in or quote it, I’d think, which ensures they know they take full responsibility for it and can’t blame the machine for any mistakes or confabulations or opinions. But otherwise, I don’t see any real reason to make them visible to readers of the final post. If included at all, they should prominently flagged—maybe the usernames are always prefixed by
AI_$USER
to ensure no one, including future LLMs, is confused—and definitely always sort to the bottom & be collapsed by default.)Yeah the LW team has been doing this sort of thing internally, still in the experimental phase. I don’t know if we’ve used all the tricks listed here yet.
I think one underused trick for training LLMs is to explicitly “edit” them. That is, suppose they generate some text X in response to prompt Y, and it has some error or is missing something. In that case you can create a text X’ that fixes this problem, and do a gradient update to increase log P(X’|Y)/P(X|Y).
For example, if we generate virtual comments in the style of certain LW users, one could either let those users browse the virtual comments that have been created in their style and correct them, or one could let the people who receive the virtual comments edit them to remove misunderstanding or similar.
You could do that, but it adds on a lot of additional hassle and problems, and I don’t think this is a ‘big enough’ application to justify the overhead or elicit enough corrections.
If the comments are generated LW-side and only upvoted/downvoted by the author, that uses no additional functionality and has no security problems; if you let authors edit arbitrary comments by other users on their drafts, which would be new functionality (I’m not aware of any place where you get to edit other users’ comments), now you suddenly have a potential security vulnerability. You also now have to track and save those edited versions somewhere as part of your corpus to finetune with, as opposed to simply slurping out the public database like any other user like GreaterWrong.
And then who would do so? Authors would have to do a lot of work to edit comments to, perhaps, one day slightly improve the feedback on others’ posts—not too incentive-compatible, nor even clearly a good use of time compared to working on their own post. (After all, presumably if the comments are bad in an easily fixed way based on the current draft, then the LLM training on the final version of that draft should get a lot of the value of the edits.) It is hard enough work to simply write a post or read comments on it and edit the post based on it for the intended human readers; I can’t imagine having such a surplus of time & energy I’d be able to go and rewrite all of the virtual comments just for future specialized LW-only LLM finetunes.
I look forward to seeing some of this integrated into LW!
I spend a fair amount of time writing comments that are probably pretty obvious to experienced LWers. I think this is important for getting newer people on-board with historical ideas and arguments. This includes explaining to enthusiastic newbies why they’re being voted into oblivion. Giving them some simulated expert comments would be great.
I think we’d see a large improvement in how useful new user’s early posts are, and therefore how much they’re encouraged to help vs. turned away by downvotes because they’re not really contributing. It would be a great way to let new posters do some efficient due diligence without catching up on the entire distributed history of discussion on their chosen topic.
I think it would also be useful for the most knowledgeable authors to have an LLM with cached context/hidden state from the best LW posts and comment sections giving virtual comments before publication.
I’d love to see whatever you’ve got going internally as an optional feature (if it doesn’t cost too much) rather than wait for a finished, integrated feature.
Can we add retrieval augmentation to this? Something that, as you are writing your article, goes: “Have you read this other article?”
At least in theory, the comments, particularly a ‘related work’ comment-tree, would do that already by talking about other LW articles as relevant. (All of which the LLM should know by heart due to the finetuning.)
Might not work out of the box, of course, in which case you could try to fix that. You could do a regular nearest-neighbors-style look up and just send that to the author as a comment (“here are the 20 most similar LW articles:”); or you could elaborate the virtual comments by adding a retrieval step and throwing into the prompt metadata about ‘similar’ articles as the draft, so the generated comments are much more likely to reference them.
All of these ideas sound awesome and exciting, and precisely the right kind of use of LLMs that I would like to see on LW!