In worlds where natural abstraction basically fails, we are thoroughly and utterly fucked
I’m not really sure what it would mean for the natural abstraction hypothesis to turn out to be true, or false. The hypothesis itself seems insufficiently clear to me.
On your view, if there are no “natural abstractions,” then we should predict that AIs will “generalize off-distribution” in ways that are catastrophic for human welfare. Okay, fine. I would prefer to just talk directly about the probability that AIs will generalize in catastrophic ways. I don’t see any reason to think they will, and so maybe in your ontology that means I must accept the natural abstraction hypothesis. But in my ontology, there’s no “natural abstraction hypothesis” link in the logical chain, I’m just directly applying induction to what we’ve seen so far about AI behavior.
It’s not clear that this is undesired behavior from the perspective of OpenAI. They aren’t actually putting GPT in a situation where it will make high-stakes decisions, and upholding deontological principles seems better from a PR perspective than consequentialist reasoning in these cases.