suppose most of the abilities we care about, when we use the term “AGI,” are locked away in the very last tiny sliver of loss just above the intrinsic entropy of text. In the final 0.00[...many extra zeros...]1 bits/character, in a loss difference so tiny we’d need vastly larger validation sets to for it to be distinguishable from data-sampling noise.
as does the ecological “road not taken”. But I think part of this puzzle is that, in fact, there aren’t adequate ecological measures of linguistic competence, vs. tasks that may be difficult but are always narrow and you never feel good about them being difficult for the right reasons. There are certainly no artificial environments where realistic linguistic competence is clearly needed to win. You hear things like, “but perplexity correlates with human judgement!” but they are always weak and unsatisfying. I think many NLP/DL researchers believe this, and just don’t know what to do about it. The road is not readily taken. I think this is a big part of why discussions of LM performance can be so unsatisfying.
This part really resonates:
as does the ecological “road not taken”. But I think part of this puzzle is that, in fact, there aren’t adequate ecological measures of linguistic competence, vs. tasks that may be difficult but are always narrow and you never feel good about them being difficult for the right reasons. There are certainly no artificial environments where realistic linguistic competence is clearly needed to win. You hear things like, “but perplexity correlates with human judgement!” but they are always weak and unsatisfying. I think many NLP/DL researchers believe this, and just don’t know what to do about it. The road is not readily taken. I think this is a big part of why discussions of LM performance can be so unsatisfying.