I have my own benchmark of tasks that I think measure general reasoning to decide when I freak out about LLMs, and they haven’t improved on them. I was ready to be cautiously optimistic that LLMs can’t scale to AGI (and would have reduced by p(doom) slightly) even if they keep scaling by conventional metrics, so the fact that scaling itself also seems to break down (maybe, possibly, partially, to whatever extent it does in fact break down, I haven’t looked into it much) and we’re reaching physical limits are all good things.
I’m not particularly more optimistic about alignment working anytime soon, just about very long timelines.
I have my own benchmark of tasks that I think measure general reasoning to decide when I freak out about LLMs, and they haven’t improved on them. I was ready to be cautiously optimistic that LLMs can’t scale to AGI (and would have reduced by p(doom) slightly) even if they keep scaling by conventional metrics, so the fact that scaling itself also seems to break down (maybe, possibly, partially, to whatever extent it does in fact break down, I haven’t looked into it much) and we’re reaching physical limits are all good things.
I’m not particularly more optimistic about alignment working anytime soon, just about very long timelines.