Scott Garrabrant comments on Decision Theory

Scott Garrabrant Nov 1, 2018, 9:43 PM
LW: 11 AF: 5
AF
Yeah, so its like you have this private data, which is an infinite sequence of bits, and if you see all 0′s you take an exploration action. I think that by giving the agent these private bits and promising that the bits do not change the rest of the world, you are essentially giving the agent access to a causal counterfactual that you constructed. You don’t even have to mix with what the agent actually does, you can explore with every action and ask if it is better to explore and take 5 or explore and take 10. By doing this, you are essentially giving the agent access to a causal counterfactual, because conditioning on these infinitesimals is basically like coming in and changing what the agent does. I think giving the agent a true source of randomness actually does let you implement CDT.
If the environment learns from the other possible worlds, It might punish or reward you in one world for stuff that you do in the other world, so you cant just ask which world is best to figure out what to do.
I agree that that is how you want to think about the matching pennies problem. However the point is that your proposed solution assumed linearity. It didn’t empirically observe linearity. You have to be able to tell the difference between the situations in order to know not to assume linearity in the matching pennies problem. The method for telling the difference is how you determine whether or not and in what ways you have logical control over Omega’s prediction of you.
- Gurkenglas Nov 1, 2018, 10:55 PM
  LW: 3 AF: 2
  AF Parent
  I posit that linearity always holds. In a deterministic universe, the linear function is between the ε-adjoined open affine space generated by our primitive set of actions and the ε-adjoined utilities. (Like in my first comment.)
  
  In a probabilistic universe, the linear function is between the ε-adjoined open affine space generated by (the set of points in) the closed affine space generated by our primitive set of actions and the ε-adjoined utilities. (Like in my second comment.)
  
  I got from one of your comments that assuming linearity wards off some problem. Does it come back in the probabilistic-universe case?
  - Scott Garrabrant Nov 1, 2018, 11:53 PM
    LW: 9 AF: 5
    AF Parent
    My point was that I don’t know where to assume the linearity is. Whenever I have private randomness, I have linearity over what I end up choosing with that randomness, but not linearity over what probability I choose. But I think this is non getting at the disagreement, so I pivot to:
    In your model, what does it mean to prove that U is some linear affine function? If I prove that my probability p is ¹⁄₂ and that U=7.5, have I proven that U is the constant function 7.5? If there is only one value of p, it is not defined what the utility function is, unless I successfully carve the universe in such a way as to let me replace the action with various things and see what happens. (or, assuming linearity replace the probability with enough linearly independent things (in this case 2) to define the function.
    - Gurkenglas Nov 2, 2018, 3:14 PM
      LW: 4 AF: 3
      AF Parent
      In the matching pennies game, $U ()$ would be proven to be $\int A () (p) * m i n (p, 1 - p) d p$ . A could maximize this by returning ε when $p$ isn’t $\frac{1}{2}$ , and $1 - \int ε d p$ (where ε is so small that this is still infinitesimally close to 1) when $p$ is $\frac{1}{2}$ .
      The linearity is always in the function between ε-adjoined open affine spaces. Whether the utilities also end up linear in the closed affine space (ie nobody cares about our reasoning process) is for the object-level information gathering process to deduce from the environment.
      You never prove that you will with certainty decide $p = \frac{1}{2}$ . You always leave a so-you’re-saying-there’s-a chance of exploration, which produces a grain of uncertainty. To execute the action, you inspect the ceremonial Boltzmann Bit (which is implemented by being constantly set to “discard the ε”), but which you treat as having an ε chance of flipping.
      The self-modification module could note that inspecting that bit is a no-op, see that removing it would make the counterfactual reasoning module crash, and leave up the Chesterton fence.
      - Scott Garrabrant Nov 2, 2018, 4:53 PM
        LW: 5 AF: 3
        AF Parent
        But how do you avoid proving with certainty that p=1/2?
        Since your proposal does not say what to do if you find inconsistent proofs that the linear function is two different things, I will assume that if it finds multiple different proofs, it defaults to 5 for the following.
        Here is another example:
        You are in a 5 and 10 problem. You have twin that is also in a 5 and 10 problem. You have exactly the same source code. There is a consistency checker, and if you and your twin do different things, you both get 0 utility.
        You can prove that you and your twin do the same thing. Thus you can prove that the function is 5+5p. You can also prove that your twin takes 5 by Lob’s theorem. (You can also prove that you take 5 by Lob’s theorem, but you ignore that proof, since “there is always a chance”) Thus, you can prove that the function is 5-5p. Your system doesn’t know what to do with two functions, so it defaults to 5. (If it is provable that you both take 5, you both take 5, completing the proof by Lob.)
        I am doing the same thing as before, but because I put it outside of the agent, it does not get flagged with the “there is always a chance” module. This is trying to illustrate that your proposal takes advantage of a separation between the agent and the environment that was snuck in, and could be done incorrectly.
        Two possible fixes:
        1) You could say that the agent, instead of taking 5 when finding inconsistency takes some action that exhibits the inconsistency (something that the two functions give different values). This is very similar to the chicken rule, and if you add something like this, you don’t really need the rest of your system. If you take an agent that whenever it proves it does something, it does something else. This agent will prove (given enough time) that if it takes 5 it gets 5, and if it takes 10 it gets 10.
        2) I had one proof system, and just ignored the proofs that I found that I did a thing. I could instead give the agent a special proof system that is incapable of proving what it does, but how do you do that? Chicken rule seems like the place to start.
        One problem with the chicken rule is that it was developed in a system that was deductively closed, so you can’t prove something that passes though a proof of P without proving P. If you violate this, by having a random theorem prover, you might have an system that fails to prove “I take 5” but proves “I take 5 and 1+1=2″ and uses this to complete the Lob loop.
        Gurkenglas Nov 2, 2018, 7:40 PM
        LW: 1 AF: 1
        AF Parent
        I can’t prove what I’m going to do and I can’t prove that I and the twin are going to do the same thing, because of the Boltzmann Bits in both of our decision-makers that might turn out different ways. But I can prove that we have a $1 - 2 ε + 2 ε^{2}$ chance of doing the same thing, and my expected utility is $(1 - ε)^{2} \cdot 10 + ε^{2} \cdot 5$ , rounding to $10$ once it actually happens.
      - Douglas_Reay Sep 24, 2019, 4:22 AM
        1 point
        Parent
        It sounds similar to the matrices in the post:
        A solvable Newcomb-like problem