Abe Dillon comments on Decision Theory

Abe Dillon 28 Jun 2019 23:34 UTC
1 point
This tends to assume that we can detangle things enough to see outcomes as a function of our actions.
No. The assumption is that an agent has *agency* over some degrees of freedom of the environment. It’s not even an assumption, really; it’s part of the definition of an agent. What is an agent with no agency?
If the agent’s actions have no influence on the state of the environment, then it can’t drive the state of the environment to satisfy any objective. The whole point of building an internal model of the environment is to understand how the agent’s actions influence the environment. In other words: “detangling things enough to see outcomes as functions of [the agent’s] actions” isn’t just an assumption, it’s essential.
The only point I can see in writing the above sentence would be if you said that a function isn’t, generally; enough to describe the relationship between an agent’s actions and the outcome: that you generally need some higher-level construct like a Turing machine. That would be fair enough if it weren’t for the fact that the theory you’re comparing yours to is AIXI which explicitly models the relationship between actions and outcomes via Turing machines.
AIXI represents the agent and the environment as separate units which interact over time through clearly defined I/O channels so that it can then choose actions maximizing reward.
Do you propose a model in which the relationship between the agent and the environment are undefined?
When the agent model is part of the environment model, it can be significantly less clear how to consider taking alternative actions.
Really? It seems you’re applying magical thinking to the consequences of embedding one Turing machine within another. Why would it’s I/O or internal modeling change so drastically? If I use a virtual machine to run Windows within Linux, does that make the experience of using MS Paint fundamentally different then running Windows in a native boot?
...there can be other copies of the agent, or things very similar to the agent.
Depending on how you draw the boundary around “yourself”, you might think you control the action of both copies or only your own.
How is that unclear? If the agent doesn’t actually control the copies, then there’s no reason to imagine it does. If it’s trying to figure out how best to exercise its agency to satisfy its objective, then imagining it has any more agency than it actually does is silly. You don’t need to wander into the philosophical no-mans-land of defining the “self”. It’s irrelevant. What are your degrees of freedom? How can you uses them to satisfy your objective? At some point, the I/O channels *must be* well defined. It’s not like a processor has an ambiguous number of pins. It’s not like a human has an ambiguous number of motor neurons.
For all intents and purposes: the agent IS the degrees of freedom it controls. The agent can only change it’s state, which; being a sub-set of the environment’s state, changes the environment in some way. You can’t lift a box, you can only change the position of your arms. If that results in a box being lifted, good! Or maybe you can’t change the position of those arms, you can only change the electric potential on some motor neurons, if that results in arms moving, good! Play that game long enough and, at some point; the set of actions you can do is finite and clearly defined.
Your five-or-ten problem is one of many that demonstrate the brittleness problem of logic-based systems operating in the real world. This is well known. People have all but abandoned logic-based systems for stochastic systems when dealing with real-world problems specifically because it’s effectively impossible to make a robust logic-based system.
This is the crux of a lot of your discussion. When you talk about an agent “knowing” its own actions or the “correctness” of counterfactuals, you’re talking about definitive results which a real-world agent would never have access to.
It’s possible (though unlikely) for a cosmic ray to damage your circuits, in which case you could go right—but you would then be insane.
If a rare, spontaneous occurrence causes you to go right, you must be insane? What? Is that really the only conclusion you could draw from that situation? If I take a photo and a cosmic ray causes one of the pixels to register white, do I need to throw my camera out because it might be “inasane”?!
Maybe we can force exploration actions so that we learn what happens when we do things?
First of all, who is “we” in this case? Are we the agent or are we some outside system “forcing” the agent to explore?
Ideally, nobody would have to force the agent to explore its world. It would want to explore and experiment as an instrumental goal to lower uncertainty in its model of the world so that it can better pursue its objective.
A bad prior can think that exploring is dangerous
That’s not a bad prior. Exploring *is* fundamentally dangerous. You’re encountering the unknown. I’m not even sure if the risk/reward ratio of exploring is decidable. It’s certainly a hard problem to determine when it’s better to explore, and when it’s too dangerous. Millions of the most sophisticated biological neural networks the planet Earth has to offer have grappled with the question for hundreds of years with no clear answer.
Forcing it to take exploratory actions doesn’t teach it what the world would look like if it took those actions deliberately.
What? Again *who* is doing the “forcing” in this situation and how? Do you really want to tread into the other philosophical no-mans-land of free-will? Why would the question of whether the agent really wanted to take an action have any bearing whatsoever on the result of that action? I’m so confused about what this sentence even means.
EDIT: It’s also unclear to me the point of the discussion on counterfactuals. Counterfactuals are of dubious utility for short-term evaluation of outcomes. They become less useful the further you separate the action from the result in time. I could think, “damn! I should have taken an alternate route to work this morning!” which is arguably useful and may actually be wrong, but if I think, “damn, if Eric the Red hadn’t sailed to the new world, Hitler would have never risen to power!” That’s not only extremely questionable, but also what use would that pondering be even if it were correct?
It seems like you’re saying an embedded agent can’t enumerate the possible outcomes of its actions before taking them, so it can only do so in retrospect. In which case, why can’t an embedded agent perform a pre-emptive tree search like any other agent? What’s the point of counterfactuals?
- dxu 29 Jun 2019 0:35 UTC
  2 points
  Parent
  At some point, the I/O channels *must be* well defined.
  This statement is precisely what is being challenged—and for good reason: it’s untrue. The reason it’s untrue is because the concept of “I/O channels” does not exist within physics as we know it; the true laws of physics make no reference to inputs, outputs, or indeed any kind of agents at all. In reality, that which is considered a computer’s “I/O channels” are simply arrangements of matter and energy, the same as everything else in our universe. There are no special XML tags attached to those configurations of matter and energy, marking them “input”, “output”, “processor”, etc. Such a notion is unphysical.
  Why might this distinction be important? It’s important because an algorithm that is implemented on physically existing hardware can be physically disrupted. Any notion of agency which fails to account for this possibility—such as, for example, AIXI, which supposes that the only interaction it has with the rest of the universe is by exchanging bits of information via the input/output channels—will fail to consider the possibility that its own operation may be disrupted. A physical implementation of AIXI would have no regard for the safety of its hardware, since it has no means of representing the fact that the destruction of its hardware equates to its own destruction.
  AIXI also fails on various decision problems that involve leaking information via a physical side channel that it doesn’t consider part of its output; for example, it has no regard for the thermal emissions it may produce as a side effect of its computations. In the extreme case, AIXI is incapable of conceptualizing the possibility that an adversarial agent may be able to inspect its hardware, and hence “read its mind”. This reflects a broader failure on AIXI’s part: it is incapable of representing an entire class of hypotheses—namely, hypotheses that involve AIXI itself being modeled by other agents in the environment. This is, again, because AIXI is defined using a framework that makes it unphysical: the classical definition of AIXI is uncomputable, making it too “big” to be modeled by any (part) of the Turing machines in its hypothesis space. This applies even to computable formulations of AIXI, such as AIXI-tl: they have no way to represent the possibility of being simulated by others, because they assume they are too large to fit in the universe.
  I’m not sure what exactly is so hard to understand about this, considering the original post conveyed all of these ideas fairly well. It may be worth considering the assumptions you’re operating under—and in particular, making sure that the post itself does not violate those assumptions—before criticizing said post based on those assumptions.
  - Abe Dillon 29 Jun 2019 1:35 UTC
    3 points
    Parent
    The reason it’s untrue is because the concept of “I/O channels” does not exist within physics as we know it.
    Yes. They most certainly do. The only truly consistent interpretation I know of current physics is information theoretic anyway, but I’m not interested in debating any of that. The fact is I’m communicating to you with physical I/O channels right now so I/O channels certainly exist in the real world.
    the true laws of physics make no reference to inputs, outputs, or indeed any kind of agents at all.
    Agents are emergent phenomenon. They don’t exist on the level of particles and waves. The concept is an abstraction.
    “I/O channels” are simply arrangements of matter and energy, the same as everything else in our universe. There are no special XML tags attached to those configurations of matter and energy, marking them “input”, “output”, “processor”, etc. Such a notion is unphysical.
    An I/O channel doesn’t imply modern computer technology. It just means information is collected from or imprinted upon the environment. It could be ant pheromones, it could be smoke signals, its physical implementation is secondary to the abstract concept of sending and receiving information of some kind. You’re not seeing the forest through the trees. Information most certainly does exist.
    Why might this distinction be important? It’s important because an algorithm that is implemented on physically existing hardware can be physically disrupted. Any notion of agency which fails to account for this possibility—such as, for example, AIXI, which supposes that the only interaction it has with the rest of the universe is by exchanging bits of information via the input/output channels—will fail to consider the possibility that its own operation may be disrupted.
    I’ve explained in previous posts that AIXI is a special case of AIXI_lt. AIXI_lt can be conceived of in an embedded context, in which case; its model of the world would include a model of itself which is subject to any sort of environmental disturbance.
    To some extent, an agent must trust its own operation to be correct, because you quickly run into infinite regression if the agent is modeling all the possible that it could be malfunctioning. What if the malfunction effects the way it models the possible ways it could malfunction? It should model all the ways a malfunction could disrupt how it models all the ways it could malfunction, right? It’s like saying “well the agent could malfunction, so it should be aware that it can malfunction so that it never malfunctions”. If the thing malfunctions, it malfunctions, it’s as simple as that.
    Aside from that, AIXI is meant to be a purely mathematical formalization, not a physical implementation. It’s an abstraction by design. It’s meant to be used as a mathematical tool for understanding intelligence.
    AIXI also fails on various decision problems that involve leaking information via a physical side channel that it doesn’t consider part of its output; for example, it has no regard for the thermal emissions it may produce as a side effect of its computations.
    Do you consider how the 30 Watts leaking out of your head might effect your plans to every day? I mean, it might cause a typhoon in Timbuktu! If you don’t consider how the waste heat produced by your mental processes effect your environment while making long or short-term plans, you must not be a real intelligent agent...
    In the extreme case, AIXI is incapable of conceptualizing the possibility that an adversarial agent may be able to inspect its hardware, and hence “read its mind”.
    AIXI can’t play tic-tac-toe with itself because that would mean it would have to model itself as part of the environment which it can’t do. Yes, I know there are fundamental problems with AIXI...
    This is, again, because AIXI is defined using a framework that makes it unphysical
    No. It’s fine to formalize something mathematically. People do it all the time. Math is a perfectly valid tool to investigate phenomena. The problem with AIXI proper, is that it’s limited to a context in which the agent and environment are independent entities. There are actually problems where that is a decent approximation, but it would be better to have a more general formulation, like AIXI_lt that can be applied to contexts in which an agent is embedded in its environment.
    This applies even to computable formulations of AIXI, such as AIXI-tl: they have no way to represent the possibility of being simulated by others, because they assume they are too large to fit in the universe.
    That’s simply not true.
    I’m not sure what exactly is so hard to understand about this, considering the original post conveyed all of these ideas fairly well. It may be worth considering the assumptions you’re operating under—and in particular, making sure that the post itself does not violate those assumptions—before criticizing said post based on those assumptions.
    I didn’t make any assumptions. I said what I believe to be correct.
    I’d love to hear you or the author explain how an agent is supposed to make decisions about what to do in an environment if it’s agency is completely undefined.
    I’d also love to hear your thoughts on the relationship between math, science, and the real world if you think comparing a physical implementation to a mathematical formalization is any more fruitful than comparing apples to oranges.
    Did you know that engineers use the “ideal gas law” every day to solve real-world problems even though they know that no real-world gas actually follows the “ideal gas law”?! You should go tell them that they’re doing it wrong!
    - dxu 29 Jun 2019 4:29 UTC
      2 points
      Parent
      The concept is an abstraction.
      *Yes, it is. The fact that it is an abstraction is precisely why it breaks down under certain circumstances.
      An I/O channel doesn’t imply modern computer technology. It just means information is collected from or imprinted upon the environment. It could be ant pheromones, it could be smoke signals, its physical implementation is secondary to the abstract concept of sending and receiving information of some kind. You’re not seeing the forest through the trees. Information most certainly does exist.
      The claim is not that “information” does not exist. The claim is that input/output channels are in fact an abstraction over more fundamental physical configurations. Nothing you wrote contradicts this, so the fact that you seem to think what I wrote was somehow incorrect is puzzling.
      I’ve explained in previous posts that AIXI is a special case of AIXI_lt. AIXI_lt can be conceived of in an embedded context,
      Yes.
      in which case; its model of the world would include a model of itself which is subject to any sort of environmental disturbance
      *No. AIXI-tl explicitly does not model itself or seek to identify itself with any part of the Turing machines in its hypothesis space. The very concept of self-modeling is entirely absent from AIXI’s definition, and AIXI-tl, being a variant of AIXI, does not include said concept either.
      To some extent, an agent must trust its own operation to be correct, because you quickly run into infinite regression if the agent is modeling all the possible that it could be malfunctioning. What if the malfunction effects the way it models the possible ways it could malfunction? It should model all the ways a malfunction could disrupt how it models all the ways it could malfunction, right? It’s like saying “well the agent could malfunction, so it should be aware that it can malfunction so that it never malfunctions”. If the thing malfunctions, it malfunctions, it’s as simple as that.
      *This is correct, so far as it goes, but what you neglect to mention is that AIXI makes no attempt to preserve its own hardware. It’s not just a matter of “malfunctioning”; humans can “malfunction” as well. However, the difference between humans and AIXI is that we understand what it means to die, and go out of our way to make sure our bodies are not put in undue danger. Meanwhile, AIXI will happily allows its hardware to be destroyed in exchange for the tiniest increase in reward. I don’t think I’m being unfair when I suggest that this behavior is extremely unnatural, and is not the kind of thing most people intuitively have in mind when they talk about “intelligence”.
      Aside from that, AIXI is meant to be a purely mathematical formalization, not a physical implementation. It’s an abstraction by design. It’s meant to be used as a mathematical tool for understanding intelligence.
      *Abstractions are useful for their intended purpose, nothing more. AIXI was formulated as an attempt to describe an extremely powerful agent, perhaps the most powerful agent possible, and it serves that purpose admirably so long as we restrict analysis to problems in which the agent and the environment can be cleanly separated. As soon as that restriction is removed, however, it’s obvious that the AIXI formalism fails to capture various intuitively desirable behaviors (e.g. self-preservation, as discussed above). As a tool for reasoning about agents in the real world, therefore, AIXI is of limited usefulness. I’m not sure why you find this idea objectionable; surely you understand that all abstractions have their limits?
      Do you consider how the 30 Watts leaking out of your head might effect your plans to every day? I mean, it might cause a typhoon in Timbuktu! If you don’t consider how the waste heat produced by your mental processes effect your environment while making long or short-term plans, you must not be a real intelligent agent...
      Indeed, you are correct that waste heat is not much of a factor when it comes to humans. However, that does not mean that the same holds true for advanced agents running on powerful hardware, especially if such agents are interacting with each other; who knows what can be deduced from various side outputs, if a superintelligence is doing the deducing? Regardless of the answer, however, one thing is clear: AIXI does not care.
      This seems to address the majority of your points, and the last few paragraphs of your comment seem mainly to be reiterating/elaborating on those points. As such, I’ll refrain from replying in detail to everything else, in order not to make this comment longer than it already is. If you respond to me, you needn’t feel obligated to reply to every individual point I made, either. I marked what I view as the most important points of disagreement with an asterisk*, so if you’re short on time, feel free to respond only to those.