Yeah the LW team has been doing this sort of thing internally, still in the experimental phase. I don’t know if we’ve used all the tricks listed here yet.
I think one underused trick for training LLMs is to explicitly “edit” them. That is, suppose they generate some text X in response to prompt Y, and it has some error or is missing something. In that case you can create a text X’ that fixes this problem, and do a gradient update to increase log P(X’|Y)/P(X|Y).
For example, if we generate virtual comments in the style of certain LW users, one could either let those users browse the virtual comments that have been created in their style and correct them, or one could let the people who receive the virtual comments edit them to remove misunderstanding or similar.
You could do that, but it adds on a lot of additional hassle and problems, and I don’t think this is a ‘big enough’ application to justify the overhead or elicit enough corrections.
If the comments are generated LW-side and only upvoted/downvoted by the author, that uses no additional functionality and has no security problems; if you let authors edit arbitrary comments by other users on their drafts, which would be new functionality (I’m not aware of any place where you get to edit other users’ comments), now you suddenly have a potential security vulnerability. You also now have to track and save those edited versions somewhere as part of your corpus to finetune with, as opposed to simply slurping out the public database like any other user like GreaterWrong.
And then who would do so? Authors would have to do a lot of work to edit comments to, perhaps, one day slightly improve the feedback on others’ posts—not too incentive-compatible, nor even clearly a good use of time compared to working on their own post. (After all, presumably if the comments are bad in an easily fixed way based on the current draft, then the LLM training on the final version of that draft should get a lot of the value of the edits.) It is hard enough work to simply write a post or read comments on it and edit the post based on it for the intended human readers; I can’t imagine having such a surplus of time & energy I’d be able to go and rewrite all of the virtual comments just for future specialized LW-only LLM finetunes.
I look forward to seeing some of this integrated into LW!
I spend a fair amount of time writing comments that are probably pretty obvious to experienced LWers. I think this is important for getting newer people on-board with historical ideas and arguments. This includes explaining to enthusiastic newbies why they’re being voted into oblivion. Giving them some simulated expert comments would be great.
I think we’d see a large improvement in how useful new user’s early posts are, and therefore how much they’re encouraged to help vs. turned away by downvotes because they’re not really contributing. It would be a great way to let new posters do some efficient due diligence without catching up on the entire distributed history of discussion on their chosen topic.
I think it would also be useful for the most knowledgeable authors to have an LLM with cached context/hidden state from the best LW posts and comment sections giving virtual comments before publication.
I’d love to see whatever you’ve got going internally as an optional feature (if it doesn’t cost too much) rather than wait for a finished, integrated feature.
Yeah the LW team has been doing this sort of thing internally, still in the experimental phase. I don’t know if we’ve used all the tricks listed here yet.
I think one underused trick for training LLMs is to explicitly “edit” them. That is, suppose they generate some text X in response to prompt Y, and it has some error or is missing something. In that case you can create a text X’ that fixes this problem, and do a gradient update to increase log P(X’|Y)/P(X|Y).
For example, if we generate virtual comments in the style of certain LW users, one could either let those users browse the virtual comments that have been created in their style and correct them, or one could let the people who receive the virtual comments edit them to remove misunderstanding or similar.
You could do that, but it adds on a lot of additional hassle and problems, and I don’t think this is a ‘big enough’ application to justify the overhead or elicit enough corrections.
If the comments are generated LW-side and only upvoted/downvoted by the author, that uses no additional functionality and has no security problems; if you let authors edit arbitrary comments by other users on their drafts, which would be new functionality (I’m not aware of any place where you get to edit other users’ comments), now you suddenly have a potential security vulnerability. You also now have to track and save those edited versions somewhere as part of your corpus to finetune with, as opposed to simply slurping out the public database like any other user like GreaterWrong.
And then who would do so? Authors would have to do a lot of work to edit comments to, perhaps, one day slightly improve the feedback on others’ posts—not too incentive-compatible, nor even clearly a good use of time compared to working on their own post. (After all, presumably if the comments are bad in an easily fixed way based on the current draft, then the LLM training on the final version of that draft should get a lot of the value of the edits.) It is hard enough work to simply write a post or read comments on it and edit the post based on it for the intended human readers; I can’t imagine having such a surplus of time & energy I’d be able to go and rewrite all of the virtual comments just for future specialized LW-only LLM finetunes.
I look forward to seeing some of this integrated into LW!
I spend a fair amount of time writing comments that are probably pretty obvious to experienced LWers. I think this is important for getting newer people on-board with historical ideas and arguments. This includes explaining to enthusiastic newbies why they’re being voted into oblivion. Giving them some simulated expert comments would be great.
I think we’d see a large improvement in how useful new user’s early posts are, and therefore how much they’re encouraged to help vs. turned away by downvotes because they’re not really contributing. It would be a great way to let new posters do some efficient due diligence without catching up on the entire distributed history of discussion on their chosen topic.
I think it would also be useful for the most knowledgeable authors to have an LLM with cached context/hidden state from the best LW posts and comment sections giving virtual comments before publication.
I’d love to see whatever you’ve got going internally as an optional feature (if it doesn’t cost too much) rather than wait for a finished, integrated feature.