SoerenMind comments on The case for aligning narrowly superhuman models

SoerenMind 24 Mar 2021 17:17 UTC
1 point
How useful would it be to work on a problem where the LM “knows” can not be superhuman but it still knows how to do well and needs to be incentivized to do so? A currently prominent example problem is that LMs produce “toxic” content:
https://lilianweng.github.io/lil-log/2021/03/21/reducing-toxicity-in-language-models.html