It sure doesn’t seem to generalize in GPT-4o case. But what’s the hypothesis for Sonnet 3.5 refusing in 85% of cases? And CoT improving score and o1 being better in browser suggests the problem is in models not understanding consequences, not in them not trying to be good. What’s the rate of capability generalization to agent environment? Are we going to conclude that Sonnet is just demonstrates reasoning, instead of doing it for real, if it solves only 85% of tasks it correctly talks about?
Also, what’s the rate of generalization of unprompted problematic behaviour avoidance? It’s much less of a problem if your AI does what you tell it to do—you can just don’t give it to users, tell it to invent nanotechnology, and win.
It doesn’t and they are fundamentally equal. The only reality is the physical one—there is no reason to complicate your ontology with platonically existing math. Math is just a collection of useful templates that may help you predict reality and that it works is always just a physical fact. Best case is that we’ll know true laws of physics and they will work like some subset of math and then axioms of physics would be actually true. You can make a guess about what axioms are compatible with true physics.
Also there is Shoenfield’s absoluteness theorem, which I don’t understand, but which maybe prevents empirical grounding of CH?