It will not meaningfully generalize beyond domains with easy verification
Why can’t we make every domain have automated verification? (I wont claim easy, but easy enough to do with finite resources) Agency, for instance, is verifiable in competitive games of arbitrary difficulty and scale. Just check who won. DeepMind has already done this to some degree with language models and virtual agents a year ago. https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/
Every other trait we care about is instrumental in agency to some degree, and the games can be customized to focus on various aspects as well, just like you focus a class in school.
Me and my college educated wife recently got stuck playing Lego Star wars… Our solution was to go to Google it. Some of these games are poorly designed and very unintuitive as others have said. Especially a game this old. Seems like they should give Claude some limited Google searches at least.
The earliest Harry Potter games had help hotlines you could call, which we had to do once when I was 9.
It’s hilarious it thinks the game might be broken sometimes, like an angry teenager claiming lag when he loses a firefight in CoD.