abramdemski comments on Counterfactual Mugging: Why should you pay?

abramdemski 20 Dec 2019 16:54 UTC
2 points
Ok, this helps me understand your view better. But not completely. I don’t think there is such a big difference between the agent and the agent-designer.
Who are the ’we” in this setup? A world designer can sure create an NPC (which you are in this setup) to one-box. Can the NPC itself change their algorithm?
We (as humans) are (always) still figuring out how to make decisions. From our perspective, we are still inventing the decision algorithm. From OO’s perspective, we were always going to behave a certain way. But, this does not contradict our perspective; OO just knows more.
In the computer-programmed scenario, there is a chain of decision points:
we think of the idea → we start programming, and design various bots → the bots themselves learn (in the case of ML bots), which selects between various strategies → the strategies themselves perform some computation to select actions
In the OO case, it does not matter so much where in this chain a particular computation occurs (because omniscient omega can predict the entire chain equally well). So it might be that I implement a bit of reasoning when writing a bot; or it might be the learning algorithm that implements that bit of reasoning; or it might be the learned strategy.
Similarly, we have a chain which includes biological evolution, cultural innovation, our parents meeting, our conception, our upbringing, what we learn in school, what we think about at various points in our lives, leading up to this moment.
Who are the ’we” in this setup? A world designer can sure create an NPC (which you are in this setup) to one-box. Can the NPC itself change their algorithm?
I do not think there is a special point in the chain. Well—it’s true that different points in the chain have varying degrees of agency. But any point in the chain which is performing important computation “could”, from its perspective, do something differently, changing the remainder of the chain. So we, the bot designer, could design the bot differently (from our perspective when choosing how to design the bot). The bot’s learning algorithm could have selected a different strategy (from its perspective). And the strategy could have selected a different action.
Of course, from our perspective, it is a little difficult to imagine the learning algorithm selecting a different strategy, if we understand how the learning algorithm works. And it is fairly difficult to imagine the strategy selecting a different action, since it is going to be a relatively small computation. But this is the same way that OO would have difficulty thinking of us doing something different, since OO can predict exactly what we do and exactly how we arrive at our decision. The learning algorithm’s entire job is to select between different alternative strategies; it has to “think as if it has a choice”, or else it could not perform the computation it needs to perform. Similarly, the learned strategy has to select between different actions; if there is a significant computational problem being solved by doing this, it must be “thinking as if it had a choice” as well (though, granted, learned strategies are often more like lookup tables, in which case I would not say that).
This does not mean choice is an illusion at any point in the chain. Choice is precisely the computation which chooses between alternatives. The alternatives are an illusion, in that counterfactuals are subjective.
So that’s my view. I’m still confused about aspects of your view. Particularly, this:
If you are a sufficiently smart NPC in the OO world, you will find that the only self-consistent approach is to act while knowing that you are just acting out your programming and that “decisions” are an illusion you cannot avoid.
How is this consistent with your assertion that OO-problems are inconsistent because “you cannot optimize for interaction with an interaction with OO”? As you say, the NPC is forced to consider the “illusion” of choice—it is an illusion which cannot be avoided. Furthermore, this is due to the real situation which it actually finds itself in. (Or at least, the realistic scenario which we are imagining it is in.) So it seems to me it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem (e.g., UDT-like thinking will tend to produce better results). So,
- The alternatives are fake (counterfactuals are subjective), but,
- The problem is real,
- The agent has to make a choice,
- There are better and worse ways of reasoning about that choice—we can see that agents who reason in one way or another do better/worse,
- It helps to study better and worse ways of reasoning ahead of time (whether that’s by ML algorithms learning, or humans abstractly reasoning about decision theory).
So it seems to me that this is very much like any other sort of hypothetical problem which we can benefit from reasoning about ahead of time (e.g., “how to build bridges”). The alternatives are imaginary, but the problem is real, and we can benefit from considering how to approach it ahead of time (whether we’re human or sufficiently advanced NPC).
- Shmi 21 Dec 2019 3:06 UTC
  2 points
  Parent
  I don’t think there is such a big difference between the agent and the agent-designer.
  Hmm. Seems to me there is a crucial difference, the former is in scope for OO, the latter is not.
  We (as humans) are (always) still figuring out how to make decisions. From our perspective, we are still inventing the decision algorithm. From OO’s perspective, we were always going to behave a certain way. But, this does not contradict our perspective; OO just knows more.
  If you know that someone has predicted your behavior, then you accept that you are a deterministic algorithm, and the inventing of the decision algorithm is just a deterministic subroutine of it. I don’t think we disagree there. The future is set, you are relegated to learning about what it is, and to feel the illusion of inventing the decision algorithm and/or acting on it. A self-consistent attitude in the OO setup is more like “I am just acting out my programming, and it feels like making decisions”.
  we think of the idea → we start programming, and design various bots → the bots themselves learn (in the case of ML bots), which selects between various strategies → the strategies themselves perform some computation to select actions
  Yes and no. “we” in this case are the agent designers, and the bots are agents acting out their programming, but we are neither OO, nor we are in the OO scope of predictability. In fact maybe there is no OO in that world, especially if the agent has access to quantum randomness or freebits, or is otherwise too hard to predict. That applies to complicated enough automata, like Alpha Zero.
  Of course, from our perspective, it is a little difficult to imagine the learning algorithm selecting a different strategy, if we understand how the learning algorithm works. And it is fairly difficult to imagine the strategy selecting a different action, since it is going to be a relatively small computation.
  Right, the more OO-like we are, the less agenty the algorithm feels to us.
  The learning algorithm’s entire job is to select between different alternative strategies; it has to “think as if it has a choice”, or else it could not perform the computation it needs to perform.
  Well. I am not sure that “it has to “think as if it has a choice”″. Thinking about having a choice seems like it requires an internal narrator, a degree of self-awareness. It is an open question whether an internal narrator necessarily emerges once the algorithm complexity is large enough. In fact, that would be an interesting open problem to work on, and if I were to do research in the area of agency and decision making, I would look into this as a project.
  If an internal narrator is not required, then there is no thinking about choices, just following the programming that makes a decision. A bacteria following a sugar gradient probably doesn’t think about choices. Not sure what counts as thinking for a chess program and whether it has the quale of having a choice.
  This does not mean choice is an illusion at any point in the chain. Choice is precisely the computation which chooses between alternatives. The alternatives are an illusion, in that counterfactuals are subjective.
  Yes, action is a part of the computation, and sometimes we anthropomorphize this action as making a choice. The alternatives are an illusion indeed, and I am not sure what you mean by counterfactuals there, potential future choices, or paths not taken because they could never have been taken given the agent’s programming.
  How is this consistent with your assertion that OO-problems are inconsistent because “you cannot optimize for interaction with an interaction with OO”? As you say, the NPC is forced to consider the “illusion” of choice—it is an illusion which cannot be avoided. Furthermore, this is due to the real situation which it actually finds itself in. (Or at least, the realistic scenario which we are imagining it is in.) So it seems to me it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem (e.g., UDT-like thinking will tend to produce better results).
  Yep, “it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem”, and these “ways of approaching the problem” are coded by the agent designer, whether explicitly, or by making it create and apply a “decision theory” subroutine. Once the algorithm is locked in by the designer (who is out of scope for OO), in this world an OO already knows what decision theory the agent will discover and use.
  TL;DR: the agent is in scope of OO, while the agent designer is out of scope and so potentially has the grounds of thinking of themselves as “making a (free) decision” without breaking self-consistency, while the agent has no such luxury. That’s the “special point in the chain”.
  I am making no claims here whether in the “real world” we are more like agents or more like agent designers, since there are no OOs that we know of that could answer the question.
  - abramdemski 23 Dec 2019 21:29 UTC
    4 points
    Parent
    Yep, “it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem”, and these “ways of approaching the problem” are coded by the agent designer, whether explicitly, or by making it create and apply a “decision theory” subroutine. Once the algorithm is locked in by the designer (who is out of scope for OO), in this world an OO already knows what decision theory the agent will discover and use.
    TL;DR: the agent is in scope of OO, while the agent designer is out of scope and so potentially has the grounds of thinking of themselves as “making a (free) decision” without breaking self-consistency, while the agent has no such luxury. That’s the “special point in the chain”.
    What exactly does in-scope / out-of-scope mean? The OO has access to what the designer does (since the designer’s design is given to the OO), so for practical purposes, the OO is predicting the designer perfectly. Just not by simulating the OO. Seems like this is what is relevant in this case.
    I am making no claims here whether in the “real world” we are more like agents or more like agent designers, since there are no OOs that we know of that could answer the question.
    But you are making the claim that there is an objective distinction. It seems to me more like a subjective one: I can look at an algorithm from a number of perspectives; some of them will be more like OO (seeing it as “just an algorithm”), while others will regard the algorithm as an agent (unable to calculate exactly what the algorithm will do, they’re forced to take the intentional stance).
    IE, for any agent you can imagine an OO for that agent, while you can also imagine a number of other perspectives. (Even if there are true-random bits involved in a decision, we can imagine an OO with access to those true-random bits. For quantum mechanics this might involve a violation of physics (e.g. no-cloning theorem), which is important in some sense, but doesn’t strike me as so philosophically important.)
    I don’t know what it means for there to be a more objective distinction, unless it is the quantum randomness thing, in which case maybe we largely agree on questions aside from terminology.
    Well. I am not sure that “it has to “think as if it has a choice”″. Thinking about having a choice seems like it requires an internal narrator, a degree of self-awareness. It is an open question whether an internal narrator necessarily emerges once the algorithm complexity is large enough. In fact, that would be an interesting open problem to work on, and if I were to do research in the area of agency and decision making, I would look into this as a project.
    If an internal narrator is not required, then there is no thinking about choices, just following the programming that makes a decision. A bacteria following a sugar gradient probably doesn’t think about choices. Not sure what counts as thinking for a chess program and whether it has the quale of having a choice.
    I want to distinguish “thinking about choices” from “awareness of thinking about choices” (which seems approximately like “thinking about thinking about choices”, though there’s probably more to it).
    I am only saying that it is thinking about choices, ie computing relative merits of different choices, not that it is necessarily consciously aware of doing so, or that it has an internal narrator.
    It “has a perspective” from which it has choices in that there is a describable epistemic position which it is in, not that it’s necessarily self-aware of being in that position in a significant sense.
    If you know that someone has predicted your behavior, then you accept that you are a deterministic algorithm, and the inventing of the decision algorithm is just a deterministic subroutine of it. I don’t think we disagree there.
    (correct)
    The future is set, you are relegated to learning about what it is, and to feel the illusion of inventing the decision algorithm and/or acting on it. A self-consistent attitude in the OO setup is more like “I am just acting out my programming, and it feels like making decisions”.
    This seems to be where we disagree. It is not like there is a seperate bit of clockwork deterministically ticking away and eventually spitting out an answer, with “us” standing off to the side and eventually learning what decision was made. We are the computation which outputs the decision. Our hand is not forced. So it does not seem right to me to say that the making-of-decisions is only an illusion. If we did not think through the decisions, they would in fact not be made the same. So the thing-which-determines-the-decision is precisely such thinking. There is not a false perception about what hand is pulling the strings in this scenario; so what is the illusion?
    - Shmi 24 Dec 2019 5:24 UTC
      2 points
      Parent
      What exactly does in-scope / out-of-scope mean? The OO has access to what the designer does (since the designer’s design is given to the OO), so for practical purposes, the OO is predicting the designer perfectly.
      I was definitely unclear there. What I meant is something like a (deterministic) computer game: the game desginer is outside the game, the agent is an NPC inside the game, and the OO is an entity with the access to the game engine. So the OO can predict the agent perfectly, but not whoever designed the agent’s algorithm. That’s the natural edge of the chain of predictability.
      But you are making the claim that there is an objective distinction. It seems to me more like a subjective one: I can look at an algorithm from a number of perspectives; some of them will be more like OO (seeing it as “just an algorithm”), while others will regard the algorithm as an agent (unable to calculate exactly what the algorithm will do, they’re forced to take the intentional stance).
      Yes, it’s more like Dennett’s intentional stance vs physical (or, in this case, algorithmic, since the universe’s physics is fully encoded in the algorithms). Definitely there are perspectives where one has to settle for the intentional stance (like the human game players do when dealing with high-level NPCs, because they are unable to calculate the NPC’s actions precisely). Whether this hypothetical game situation is isomorphic to the universe we live in is an open problem, and I do not make definite claims that it is.
      I want to distinguish “thinking about choices” from “awareness of thinking about choices” (which seems approximately like “thinking about thinking about choices”, though there’s probably more to it).
      It’s a good distinction, definitely. “Thinking about choices” is executing the decision making algorithm, including generating the algorithm itself. I was referring to thinking about the origin of both of those. It may or may not be what you are referring to.
      This seems to be where we disagree. It is not like there is a seperate bit of clockwork deterministically ticking away and eventually spitting out an answer, with “us” standing off to the side and eventually learning what decision was made. We are the computation which outputs the decision. Our hand is not forced.
      Yes, that’s where we differ, in the very last sentence. There is no separate bit of an algorithm, we (or, in this case, the agents in the setup) are the algorithm. Yes, we are the computation which outputs the decision. And that’s precisely why our hand is forced! There is no other output possible even if it feels like it is.
      So it does not seem right to me to say that the making-of-decisions is only an illusion. If we did not think through the decisions, they would in fact not be made the same.
      Looks like this is the crux of the disagreement. the agents have no option not to think through the decisions. Once the universe is set in motion, the agents will execute their algorithms, including thinking through the decisions, generating the relevant abstractions, including the decision theory, then executing the decision to pay or not pay the counterfactual mugger. “If we did not think through the decisions” in not an option in this universe, except potentially as a (useless) subroutine in the agent’s algorithm. The agent will do what it is destined to do, and, while the making-of-decisions is not an illusion, since the decision is eventually made, the potential to make a different decision is definitely an illusion, just like the potential to not think through the decisions.
      So, a (more) self-consistent approach to “thinking about thinking” is “Let’s see what decision theory, if any, my algorithm will generate, and how it will apply it to the problem at hand.” I am not sure whether there is any value in this extra layer, or if there is anything that can be charitably called “value” in this setup from an outside perspective. Certainly the OO does not need the abstraction we call “value” to predict anything, they can just emulate (or analyze) the agent’s algorithm.
      So, my original point that “Do you give Omega $100?” is not a meaningful question as stated, since it assumes you have a choice in the matter. You can phrase the question differently, and more profitably, as “Do you think that you are the sort of agent who gives Omega $100?” or “Which agents gain more expected value in this setup?” There is no freedom to “self-modify” to be an agent that pays or doesn’t pay. You are one of the two, you just don’t yet know which. Best you can do is try to discover it ahead of time.