Here’s a non-obvious way it could fail. I don’t expect researchers to make this kind of mistake, but if this reasoning is correct, public access of such an AI is definitely not a good idea.
Also, consider a text predictor which is trying to roleplay as an unaligned superintelligence. This situation could be triggered even without the knowledge of the user by accidentally creating a conversation which the AI relates to a story about a rogue SI, for example. In that case it may start to output manipulative replies, suggest blueprints for agentic AIs, and maybe even cause the user to run an obfuscated version of the program from the linked post. The AI doesn’t need to be an agent for any of this to happen (though it would be clearly much more likely if it were one).
I don’t think that any of those failure modes (including the model developing some sort of internal agent to better predict text) are very likely to happen in a controlled environment. However, as others have mentioned, agent AIs are simply more powerful, so we’re going to build them too.
At this point I have to ask what exactly is meant by this. The bigger model beats the average human performance on the national math exam in Poland. Sure, the people taking this exam are usually not adults, but for many it may be where they peak in their mathematical abilities, so I wouldn’t be surprised if it beats average human performance in the US. It’s all rather vague though; looking at the MATH dataset paper all I could find regarding human performance was the following:
So, for solving undergraduate-level math problems, this model would be somewhere between university students who dislike mathematics and ones who are neutral towards it? Maybe. Would be nice to get more details here, I assume they didn’t think much about human-level performance since the previous SOTA was clearly very far from it.