Richard_Ngo comments on Embedded Agency: Not Just an AI Problem

Richard_Ngo 27 Jun 2019 14:53 UTC
13 points
We have strong outside-view reasons to expect that the information processing in question probably approximates Bayesian reasoning (for some model of the environment), and the decision-making process approximately maximizes some expected utility function (which itself approximates fitness within the ancestral environment).
The use of “approximates” in this sentence (and in the post as a whole) is so loose as to be deeply misleading—for the same reasons that the “blue-minimising robot” shouldn’t be described as maximising some expected utility function, and the information processing done by a single neuron shouldn’t be described as Bayesian reasoning (even approximately!)
See also: coherent behaviour in the real world is an incoherent concept.
- johnswentworth 27 Jun 2019 16:41 UTC
  16 points
  Parent
  I think the idea that real-world coherence can’t work mainly stems from everybody relying on the VNM utility theorem, and then trying to make it work directly without first formulating the agent’s world-model as a separate step. If we just forget about VNM utility theorem and come at the problem from a more principled Bayesian angle instead, things work out just fine.
  Here’s the difference: VNM utility theorem postulates “lotteries” as something already present in the ontology. Agents have preferences over lotteries directly, and agents’ preferences must take probabilities as inputs. There’s no built-in notion of what exactly “randomness” means, what exactly a “probability” physically corresponds to, or anything like that. If we formulate those notions correctly, then things work, but VNM utility does not itself provide the formulation, so everybody gets confused.
  Contrast that with e.g. FTAP + dutch book arguments: these provide a similar-looking conclusion to VNM utility theory (i.e. maximize expected utility), but the assumptions are quite different. In particular, they do not start with any inherent notion of “probability”—assuming inexploitability, they show that some (not necessarily unique) probability distribution exists, under which the agent can be interpreted as maximizing utility. This puts focus on the real issue: what exactly is the agent’s world-model?
  As you say in the post you linked:
  those hypothetical choices are always between known lotteries with fixed probabilities, rather than being based on our subjective probability estimates as they are in the real world… VNM coherence is not well-defined in this setup, so if we want to formulate a rigorous version of this argument, we’ll need to specify a new definition of coherence which extends the standard instantaneous-hypothetical one.
  … which is exactly right. That’s why I consider VNM coherence a bad starting point for this sort of thing.
  Getting more into the particulars of that post...
  I would summarize the main argument in your post as roughly: “we can’t observe counterfactual behavior, and without that we can’t map the utility function, unless the utility function is completely static and depends only on current state of the world.” So we can’t map utilities over trajectories, we can’t map off-equilibrium strategies, we can’t map time-dependent utilities, etc.
  The problem with that line of argument is that it treats the agent as a black box. Breaking open the black box is what embedded agency is all about, including all the examples in the OP. Once the black box is open, we do not need to rely on observed behavior—we know what the internal gears are, so we can talk about counterfactual behavior directly. In particular, once the black box is open, we can (in principle) talk about the agent’s internal ontology. Once the agent’s internal ontology is known, possibilities like “the agent prefers to travel in circles” are hypotheses we can meaningfully check—not by observing the agent’s behavior, but by seeing what computation it performs with its internal notion of “travelling in circles”.