johnswentworth answers Is “hidden complexity of wishes problem” solved?

johnswentworth 6 Jan 2025 1:00 UTC
8 points
−5
Short answer: no.
Longer answer: we need to distinguish between two things people might have in mind when they say that LLMs “solve the hidden complexity of wishes problem”.
First, one might imagine that LLMs “solve the hidden complexity of wishes problem” because they’re able to answer natural-language questions about humans’ wishes much the same way a human would. Alas, that’s a misunderstanding of the problem. If the ability to answer natural-language questions about humans’ wishes in human-like ways were all we needed in order to solve the “hidden complexity of wishes” problem, then a plain old human would be a solution to the problem; one could just ask the human. Part of the problem is that humans themselves understand their own wishes so poorly that their own natural-language responses to questions are not a safe optimization target either.
Second, one might imagine LLMs “solve the hidden complexity of wishes problem” because when we ask an LLM to solve a problem, it solves the problem in a human-like way. It’s not about the LLM’s knowledge of humans’ (answers to questions about their) wishes, but rather about LLMs solving problems and optimizing in ways which mimic human problem-solving and optimization. And that does handle the hidden complexity problem… but only insofar as we continue to use LLMs in exactly the same way. If we start e.g. scaling up o1-style methods, or doing HCH, or put the LLM in some other scaffolding so we’re not directly asking it to solve a problem and then using the human-like solutions it generates… then we’re (potentially) back to having a hidden complexity problem. For each of those different methods of using the LLM to solve problems, we have to separately consider whether the human-mimicry properties of the LLM generalize to that method enough to handle the hidden complexity issue.
(Toy example: suppose we use LLMs to mimic a very very large organization. Like most real-world organizations, information and constraints end up fairly siloed/modularized, so some parts of the system are optimizing for e.g. “put out the fire” and don’t know that grandma’s in the house at all. And then maybe that part of the system chooses a nice efficient fire-extinguishing approach which kills grandma, like e.g. collapsing the house and then smothering it.)
And crucially: if AI is ever to solve problems too hard for humans (which is one of its main value propositions), we’re definitely going to need to do something with LLMs besides use them to solve problems in human-like ways.