I think one underused trick for training LLMs is to explicitly “edit” them. That is, suppose they generate some text X in response to prompt Y, and it has some error or is missing something. In that case you can create a text X’ that fixes this problem, and do a gradient update to increase log P(X’|Y)/P(X|Y).
For example, if we generate virtual comments in the style of certain LW users, one could either let those users browse the virtual comments that have been created in their style and correct them, or one could let the people who receive the virtual comments edit them to remove misunderstanding or similar.
You could do that, but it adds on a lot of additional hassle and problems, and I don’t think this is a ‘big enough’ application to justify the overhead or elicit enough corrections.
If the comments are generated LW-side and only upvoted/downvoted by the author, that uses no additional functionality and has no security problems; if you let authors edit arbitrary comments by other users on their drafts, which would be new functionality (I’m not aware of any place where you get to edit other users’ comments), now you suddenly have a potential security vulnerability. You also now have to track and save those edited versions somewhere as part of your corpus to finetune with, as opposed to simply slurping out the public database like any other user like GreaterWrong.
And then who would do so? Authors would have to do a lot of work to edit comments to, perhaps, one day slightly improve the feedback on others’ posts—not too incentive-compatible, nor even clearly a good use of time compared to working on their own post. (After all, presumably if the comments are bad in an easily fixed way based on the current draft, then the LLM training on the final version of that draft should get a lot of the value of the edits.) It is hard enough work to simply write a post or read comments on it and edit the post based on it for the intended human readers; I can’t imagine having such a surplus of time & energy I’d be able to go and rewrite all of the virtual comments just for future specialized LW-only LLM finetunes.
I think one underused trick for training LLMs is to explicitly “edit” them. That is, suppose they generate some text X in response to prompt Y, and it has some error or is missing something. In that case you can create a text X’ that fixes this problem, and do a gradient update to increase log P(X’|Y)/P(X|Y).
For example, if we generate virtual comments in the style of certain LW users, one could either let those users browse the virtual comments that have been created in their style and correct them, or one could let the people who receive the virtual comments edit them to remove misunderstanding or similar.
You could do that, but it adds on a lot of additional hassle and problems, and I don’t think this is a ‘big enough’ application to justify the overhead or elicit enough corrections.
If the comments are generated LW-side and only upvoted/downvoted by the author, that uses no additional functionality and has no security problems; if you let authors edit arbitrary comments by other users on their drafts, which would be new functionality (I’m not aware of any place where you get to edit other users’ comments), now you suddenly have a potential security vulnerability. You also now have to track and save those edited versions somewhere as part of your corpus to finetune with, as opposed to simply slurping out the public database like any other user like GreaterWrong.
And then who would do so? Authors would have to do a lot of work to edit comments to, perhaps, one day slightly improve the feedback on others’ posts—not too incentive-compatible, nor even clearly a good use of time compared to working on their own post. (After all, presumably if the comments are bad in an easily fixed way based on the current draft, then the LLM training on the final version of that draft should get a lot of the value of the edits.) It is hard enough work to simply write a post or read comments on it and edit the post based on it for the intended human readers; I can’t imagine having such a surplus of time & energy I’d be able to go and rewrite all of the virtual comments just for future specialized LW-only LLM finetunes.