Possible outcomes are in the mind of a world-modeller—reality just is as it is (exactly one way) and isn’t made of possibilities. So in what sense do the consequentialist-like things Yudkowsky is referring to funnel history?
I’m not sure that I understand the question, but my intuition is to say: they funnel world-states into particular outcomes in the same sense that literal funnels funnel water into particular spaces, or in the same sense that a slope makes things roll down it.
If you find water in a previously-empty space with a small aperture, and you’re confused that no water seems to have spilled over the sides, you may suspect that a funnel was there. Funnels are part of a larger deterministic universe, so maybe in some sense any given funnel (like everything else) ‘had to do exactly that thing’. Still, we can observe that funnels are an important part of the causal chain in these cases, and that places with funnels tend to end up with this type of outcome much more often.
Similarly, consequentialists tend to remake parts of the world (typically, as much of the world as they can reach) into things that are high in their preference ordering. From Optimization and the Singularity:
[...] Suppose you have a car, and suppose we already know that your preferences involve travel. Now suppose that you take all the parts in the car, or all the atoms, and jumble them up at random. It’s very unlikely that you’ll end up with a travel-artifact at all, even so much as a wheeled cart; let alone a travel-artifact that ranks as high in your preferences as the original car. So, relative to your preference ordering, the car is an extremely improbable artifact; the power of an optimization process is that it can produce this kind of improbability.
You can view both intelligence and natural selection as special cases of optimization: Processes that hit, in a large search space, very small targets defined by implicit preferences. Natural selection prefers more efficient replicators. Human intelligences have more complex preferences. Neither evolution nor humans have consistent utility functions, so viewing them as “optimization processes” is understood to be an approximation. You’re trying to get at the sort of work being done, not claim that humans or evolution do this work perfectly.
This is how I see the story of life and intelligence—as a story of improbably good designs being produced by optimization processes. The “improbability” here is improbability relative to a random selection from the design space, not improbability in an absolute sense—if you have an optimization process around, then “improbably” good designs become probable. [...]
But it’s not clear what a “preference” is, exactly. So a more general way of putting it, in Recognizing Intelligence, is:
[...] Suppose I landed on an alien planet and discovered what seemed to be a highly sophisticated machine, all gleaming chrome as the stereotype demands. Can I recognize this machine as being in any sense well-designed, if I have no idea what the machine is intended to accomplish? Can I guess that the machine’s makers were intelligent, without guessing their motivations?
And again, it seems like in an intuitive sense I should obviously be able to do so. I look at the cables running through the machine, and find large electrical currents passing through them, and discover that the material is a flexible high-temperature high-amperage superconductor. Dozens of gears whir rapidly, perfectly meshed...
I have no idea what the machine is doing. I don’t even have a hypothesis as to what it’s doing. Yet I have recognized the machine as the product of an alien intelligence.
[...] Why is it a good hypothesis to suppose that intelligence or any other optimization process played a role in selecting the form of what I see, any more than it is a good hypothesis to suppose that the dust particles in my rooms are arranged by dust elves?
Consider that gleaming chrome. Why did humans start making things out of metal? Because metal is hard; it retains its shape for a long time. So when you try to do something, and the something stays the same for a long period of time, the way-to-do-it may also stay the same for a long period of time. So you face the subproblem of creating things that keep their form and function. Metal is one solution to that subproblem.
[… A]s simple a form of negentropy as regularity over time—that the alien’s terminal values don’t take on a new random form with each clock tick—can imply that hard metal, or some other durable substance, would be useful in a “machine”—a persistent configuration of material that helps promote a persistent goal.
The gears are a solution to the problem of transmitting mechanical forces from one place to another, which you would want to do because of the presumed economy of scale in generating the mechanical force at a central location and then distributing it. In their meshing, we recognize a force of optimization applied in the service of a recognizable instrumental value: most random gears, or random shapes turning against each other, would fail to mesh, or fly apart. Without knowing what the mechanical forces are meant to do, we recognize something that transmits mechanical force—this is why gears appear in many human artifacts, because it doesn’t matter much what kind of mechanical force you need to transmit on the other end. You may still face problems like trading torque for speed, or moving mechanical force from generators to appliers.
These are not universally convergent instrumental challenges. They probably aren’t even convergent with respect to maximum-entropy goal systems (which are mostly out of luck).
But relative to the space of low-entropy, highly regular goal systems—goal systems that don’t pick a new utility function for every different time and every different place—that negentropy pours through the notion of “optimization” and comes out as a concentrated probability distribution over what an “alien intelligence” would do, even in the “absence of any hypothesis” about its goals. [...]
“Consequentialists funnel the universe into shapes that are higher in their preference ordering” isn’t a required inherent truth for all consequentialists; some might have weird goals, or be too weak to achieve much. Likewise, some literal funnels are broken or misshapen, or just never get put to use. But in both cases, we can understand the larger class by considering the unusual function well-working instances can perform.
(In the case of literal funnels, we can also understand the class by considering its physical properties rather than its function/behavior/effects. Eventually we should be able to do the same for consequentialists, but currently we don’t know what physical properties of a system make it consequentialist, beyond the level of generality of e.g. ‘its future-steering will approximately obey expected utility theory’.)
Thanks for the replies! I’m still somewhat confused but will try again to both ask the question more clearly and summarise my current understanding.
What, in the case of consequentialists, is analogous to the water funnelled by literal funnels? Is it possibilities-according-to-us? Or is it possibilities-according-to-the-consequentialist? Or is it neither (or both) of those?
To clarify a little what the options in my original comment were, I’ll say what I think they correspond to for literal funnels. Option 1 corresponds to the fact that funnels are usually nearby (in spacetime) when water is in a small space without having spilled, and Option 2 corresponds to the characteristic funnel shape (in combination with facts about physical laws maybe).
I think your and Eliezer’s replies are pointing me at a sense in which both Option 1 and Option 2 are correct, but they are used in different ways in the overall story. To tell this story, I want to draw a distinction between outcome-pumps (behavioural agents) and consequentialists (structural agents). Outcome-pumps are effective at achieving outcomes, and this effectiveness is measured according to our models (option 1). Consequentialists do (or have done in their causal history) the work of selecting actions according to expected consequences in coherent pursuit of an outcome, and the expected consequences are therefore their own (option 2).
Spelling this out a little more—Outcome-pumps are optimizing systems: there is a space of possible configurations, a much smaller target subset of configurations, and a basin of attraction such that if the system+surroundings starts within the basin, it ends up within the target. There are at least two ways of looking at the configuration space. Firstly, there’s the range of situations in which we actually observe the same (or similar) outcome-pump system and that it achieved its outcome. Secondly, there’s the range of hypothetical possibilities we can imagine and reason about putting the outcome-pump system into, and extrapolating (using our own models) that it will achieve the outcome. Both of these ways are “Option 1”.
Consequentialists (structural agents) do the work, somewhere somehow—maybe in their brains, maybe in their causal history, maybe in other parts of their structure and history—of maintaining and updating beliefs and selecting actions that lead to (their modelled) expected consequences that are high in their preference ordering (this is all Option 2).
It should be somewhat uncontroversial that consequentialists are outcome pumps, to the extent that they’re any good at doing the consequentialist thing (and have sufficiently achievable preferences relative to their resources etc).
The more substantial claim I read MIRI as making is that outcome pumps are consequentialists, because the only way to be an outcome pump is to be a consequentialist. Maybe you wouldn’t make this claim so strongly, since there are counterexamples like fires and black holes—and there may be some restrictions on what kind of outcome pumps the claim applies to (such as some level of retargetability or robustness?).
How does this overall take sound?
Scott Garrabrant’s question on whether agent-like behaviour implies agent-like architecture seems pretty relevant to this whole discussion—Eliezer, do you have an answer to that question? Or at least do you think it’s an important open question?
My reply to your distinction between ‘consequentialists’ and ‘outcome pumps’ would be, “Please forget entirely about any such thing as a ‘consequentialist’ as you defined it; I would now like to talk entirely about powerful outcome pumps. All understanding begins there, and we should only introduce the notion of how outcomes are pumped later in the game. Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine.”
(Modulo that lots of times people here are like “Well but a human at a particular intelligence level in a particular complicated circumstance once did this kind of work without the thing happening that it sounds like you say happens with powerful outcome pumps”; and then you have to look at the human engine and its circumstances to understand why outcome pumping could specialize down to that exact place and fashion, which will not be reduplicated in more general outcome pumps that have their dice re-rolled.)
Do you agree that Flint’s optimizing systems are a good model (or even definition) of outcome pumps?
Are black holes and fires reasonable examples of outcome pumps?
I’m asking these to understand the work better.
Currently my answers are:
Yes. Flint’s notion is one I came to independently when thinking about “goal-directedness”. It could be missing some details, but I find it hard to snap out of the framework entirely.
Yes. But maybe not the most informative examples. They’re highly non-retargetable.
Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine.”
I don’t know the relevant history of science, but I wouldn’t be surprised if something like the opposite was true: Our modern, very useful understanding of work is an abstraction that grew out of many people thinking concretely about various engines. Thinking about engines was like the homework exercises that helped people to reach and understand the concept of work.
Similarly, perhaps it is pedagogically (and conceptually) helpful to begin with the notion of a consequentialist and then generalize to outcome pumps.
I’m not sure that I understand the question, but my intuition is to say: they funnel world-states into particular outcomes in the same sense that literal funnels funnel water into particular spaces, or in the same sense that a slope makes things roll down it.
If you find water in a previously-empty space with a small aperture, and you’re confused that no water seems to have spilled over the sides, you may suspect that a funnel was there. Funnels are part of a larger deterministic universe, so maybe in some sense any given funnel (like everything else) ‘had to do exactly that thing’. Still, we can observe that funnels are an important part of the causal chain in these cases, and that places with funnels tend to end up with this type of outcome much more often.
Similarly, consequentialists tend to remake parts of the world (typically, as much of the world as they can reach) into things that are high in their preference ordering. From Optimization and the Singularity:
But it’s not clear what a “preference” is, exactly. So a more general way of putting it, in Recognizing Intelligence, is:
“Consequentialists funnel the universe into shapes that are higher in their preference ordering” isn’t a required inherent truth for all consequentialists; some might have weird goals, or be too weak to achieve much. Likewise, some literal funnels are broken or misshapen, or just never get put to use. But in both cases, we can understand the larger class by considering the unusual function well-working instances can perform.
(In the case of literal funnels, we can also understand the class by considering its physical properties rather than its function/behavior/effects. Eventually we should be able to do the same for consequentialists, but currently we don’t know what physical properties of a system make it consequentialist, beyond the level of generality of e.g. ‘its future-steering will approximately obey expected utility theory’.)
Thanks for the replies! I’m still somewhat confused but will try again to both ask the question more clearly and summarise my current understanding.
What, in the case of consequentialists, is analogous to the water funnelled by literal funnels? Is it possibilities-according-to-us? Or is it possibilities-according-to-the-consequentialist? Or is it neither (or both) of those?
To clarify a little what the options in my original comment were, I’ll say what I think they correspond to for literal funnels. Option 1 corresponds to the fact that funnels are usually nearby (in spacetime) when water is in a small space without having spilled, and Option 2 corresponds to the characteristic funnel shape (in combination with facts about physical laws maybe).
I think your and Eliezer’s replies are pointing me at a sense in which both Option 1 and Option 2 are correct, but they are used in different ways in the overall story. To tell this story, I want to draw a distinction between outcome-pumps (behavioural agents) and consequentialists (structural agents). Outcome-pumps are effective at achieving outcomes, and this effectiveness is measured according to our models (option 1). Consequentialists do (or have done in their causal history) the work of selecting actions according to expected consequences in coherent pursuit of an outcome, and the expected consequences are therefore their own (option 2).
Spelling this out a little more—Outcome-pumps are optimizing systems: there is a space of possible configurations, a much smaller target subset of configurations, and a basin of attraction such that if the system+surroundings starts within the basin, it ends up within the target. There are at least two ways of looking at the configuration space. Firstly, there’s the range of situations in which we actually observe the same (or similar) outcome-pump system and that it achieved its outcome. Secondly, there’s the range of hypothetical possibilities we can imagine and reason about putting the outcome-pump system into, and extrapolating (using our own models) that it will achieve the outcome. Both of these ways are “Option 1”.
Consequentialists (structural agents) do the work, somewhere somehow—maybe in their brains, maybe in their causal history, maybe in other parts of their structure and history—of maintaining and updating beliefs and selecting actions that lead to (their modelled) expected consequences that are high in their preference ordering (this is all Option 2).
It should be somewhat uncontroversial that consequentialists are outcome pumps, to the extent that they’re any good at doing the consequentialist thing (and have sufficiently achievable preferences relative to their resources etc).
The more substantial claim I read MIRI as making is that outcome pumps are consequentialists, because the only way to be an outcome pump is to be a consequentialist. Maybe you wouldn’t make this claim so strongly, since there are counterexamples like fires and black holes—and there may be some restrictions on what kind of outcome pumps the claim applies to (such as some level of retargetability or robustness?).
How does this overall take sound?
Scott Garrabrant’s question on whether agent-like behaviour implies agent-like architecture seems pretty relevant to this whole discussion—Eliezer, do you have an answer to that question? Or at least do you think it’s an important open question?
My reply to your distinction between ‘consequentialists’ and ‘outcome pumps’ would be, “Please forget entirely about any such thing as a ‘consequentialist’ as you defined it; I would now like to talk entirely about powerful outcome pumps. All understanding begins there, and we should only introduce the notion of how outcomes are pumped later in the game. Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine.”
(Modulo that lots of times people here are like “Well but a human at a particular intelligence level in a particular complicated circumstance once did this kind of work without the thing happening that it sounds like you say happens with powerful outcome pumps”; and then you have to look at the human engine and its circumstances to understand why outcome pumping could specialize down to that exact place and fashion, which will not be reduplicated in more general outcome pumps that have their dice re-rolled.)
A couple of direct questions I’m stuck on:
Do you agree that Flint’s optimizing systems are a good model (or even definition) of outcome pumps?
Are black holes and fires reasonable examples of outcome pumps?
I’m asking these to understand the work better.
Currently my answers are:
Yes. Flint’s notion is one I came to independently when thinking about “goal-directedness”. It could be missing some details, but I find it hard to snap out of the framework entirely.
Yes. But maybe not the most informative examples. They’re highly non-retargetable.
I don’t know the relevant history of science, but I wouldn’t be surprised if something like the opposite was true: Our modern, very useful understanding of work is an abstraction that grew out of many people thinking concretely about various engines. Thinking about engines was like the homework exercises that helped people to reach and understand the concept of work.
Similarly, perhaps it is pedagogically (and conceptually) helpful to begin with the notion of a consequentialist and then generalize to outcome pumps.