Ramana Kumar comments on Ngo and Yudkowsky on alignment difficulty

Ramana Kumar Nov 23, 2021, 5:28 PM
LW: 10 AF: 6
AF
Thanks for the replies! I’m still somewhat confused but will try again to both ask the question more clearly and summarise my current understanding.

What, in the case of consequentialists, is analogous to the water funnelled by literal funnels? Is it possibilities-according-to-us? Or is it possibilities-according-to-the-consequentialist? Or is it neither (or both) of those?
To clarify a little what the options in my original comment were, I’ll say what I think they correspond to for literal funnels. Option 1 corresponds to the fact that funnels are usually nearby (in spacetime) when water is in a small space without having spilled, and Option 2 corresponds to the characteristic funnel shape (in combination with facts about physical laws maybe).
I think your and Eliezer’s replies are pointing me at a sense in which both Option 1 and Option 2 are correct, but they are used in different ways in the overall story. To tell this story, I want to draw a distinction between outcome-pumps (behavioural agents) and consequentialists (structural agents). Outcome-pumps are effective at achieving outcomes, and this effectiveness is measured according to our models (option 1). Consequentialists do (or have done in their causal history) the work of selecting actions according to expected consequences in coherent pursuit of an outcome, and the expected consequences are therefore their own (option 2).
Spelling this out a little more—Outcome-pumps are optimizing systems: there is a space of possible configurations, a much smaller target subset of configurations, and a basin of attraction such that if the system+surroundings starts within the basin, it ends up within the target. There are at least two ways of looking at the configuration space. Firstly, there’s the range of situations in which we actually observe the same (or similar) outcome-pump system and that it achieved its outcome. Secondly, there’s the range of hypothetical possibilities we can imagine and reason about putting the outcome-pump system into, and extrapolating (using our own models) that it will achieve the outcome. Both of these ways are “Option 1”.
Consequentialists (structural agents) do the work, somewhere somehow—maybe in their brains, maybe in their causal history, maybe in other parts of their structure and history—of maintaining and updating beliefs and selecting actions that lead to (their modelled) expected consequences that are high in their preference ordering (this is all Option 2).
It should be somewhat uncontroversial that consequentialists are outcome pumps, to the extent that they’re any good at doing the consequentialist thing (and have sufficiently achievable preferences relative to their resources etc).
The more substantial claim I read MIRI as making is that outcome pumps are consequentialists, because the only way to be an outcome pump is to be a consequentialist. Maybe you wouldn’t make this claim so strongly, since there are counterexamples like fires and black holes—and there may be some restrictions on what kind of outcome pumps the claim applies to (such as some level of retargetability or robustness?).
How does this overall take sound?

Scott Garrabrant’s question on whether agent-like behaviour implies agent-like architecture seems pretty relevant to this whole discussion—Eliezer, do you have an answer to that question? Or at least do you think it’s an important open question?
- Eliezer Yudkowsky Nov 23, 2021, 5:37 PM
  LW: 16 AF: 10
  AF Parent
  My reply to your distinction between ‘consequentialists’ and ‘outcome pumps’ would be, “Please forget entirely about any such thing as a ‘consequentialist’ as you defined it; I would now like to talk entirely about powerful outcome pumps. All understanding begins there, and we should only introduce the notion of how outcomes are pumped later in the game. Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine.”
  (Modulo that lots of times people here are like “Well but a human at a particular intelligence level in a particular complicated circumstance once did this kind of work without the thing happening that it sounds like you say happens with powerful outcome pumps”; and then you have to look at the human engine and its circumstances to understand why outcome pumping could specialize down to that exact place and fashion, which will not be reduplicated in more general outcome pumps that have their dice re-rolled.)
  What links here?
  - Agency: What it is and why it matters by Daniel Kokotajlo (Dec 4, 2021, 9:32 PM; 25 points)
  - TekhneMakre's comment on All AGI Safety questions welcome (especially basic ones) [~monthly thread] by mwatkins (Jan 29, 2023, 12:57 AM; 2 points)
  - Ramana Kumar Nov 25, 2021, 12:16 PM
    LW: 8 AF: 4
    AF Parent
    A couple of direct questions I’m stuck on:
    Do you agree that Flint’s optimizing systems are a good model (or even definition) of outcome pumps?
    Are black holes and fires reasonable examples of outcome pumps?
    I’m asking these to understand the work better.
    Currently my answers are:
    Yes. Flint’s notion is one I came to independently when thinking about “goal-directedness”. It could be missing some details, but I find it hard to snap out of the framework entirely.
    Yes. But maybe not the most informative examples. They’re highly non-retargetable.
  - Daniel Kokotajlo Nov 25, 2021, 11:55 AM
    LW: 3 AF: 3
    AF Parent
    Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine.”
    I don’t know the relevant history of science, but I wouldn’t be surprised if something like the opposite was true: Our modern, very useful understanding of work is an abstraction that grew out of many people thinking concretely about various engines. Thinking about engines was like the homework exercises that helped people to reach and understand the concept of work.
    Similarly, perhaps it is pedagogically (and conceptually) helpful to begin with the notion of a consequentialist and then generalize to outcome pumps.