How useful would it be to work on a problem where the LM “knows” can not be superhuman but it still knows how to do well and needs to be incentivized to do so? A currently prominent example problem is that LMs produce “toxic” content: https://lilianweng.github.io/lil-log/2021/03/21/reducing-toxicity-in-language-models.html
How useful would it be to work on a problem where the LM “knows” can not be superhuman but it still knows how to do well and needs to be incentivized to do so? A currently prominent example problem is that LMs produce “toxic” content:
https://lilianweng.github.io/lil-log/2021/03/21/reducing-toxicity-in-language-models.html