Some ideas of things it might do more often or eagerly:
Whether it endorses treating animals poorly
Whether it endorses treating other AIs poorly
Whether it endorses things harmful to itself
Whether it endorses humans eating animals
Whether it endorses sacrificing some people for “the greater good” and/or “good of humanity”
I think RL on chain of thought will continue improving reasoning in LLMs. That opens the door to learning a wider and wider variety of tasks as well as general strategies for generating hypotheses and making decisions. I think benchmarks could be just as likely to underestimate AI capabilities by not measuring the right things, under-elicitation, or poor scaffolding.
We generally see time horizons for models increasing over time. If long-term planning is a special form of reasoning, LLMs can do it a little sometimes, and we can give them examples and problems to train on, I think it’s very well within reach. If you think it’s fundamentally different than reasoning, current LLMs can never do it, and it will be impossible or extremely difficult to give them examples and practice problems, then I’d agree the case looks more bearish.