Gurkenglas comments on Troll Bridge

Gurkenglas Sep 4, 2019, 12:05 PM
LW: 1 AF: 1
AF

(Maybe this doesn’t answer your question?)

Correct. I am trying to pin down exactly what you mean by an agent controlling a logical statement. To that end, I ask whether an agent that takes an action iff a statement is true controls the statement through choosing whether to take the action. (“The Killing Curse doesn’t crack your soul. It just takes a cracked soul to cast.”)

Perhaps we could equip logic with a “causation” preorder such that all tautologies are equivalent, causation implies implication, and whenever we define an agent, we equip its control circuits with causation. Then we could say that A doesn’t cross the bridge because it’s not insane. (I perhaps contentiously assume that insanity and proving sanity are causally equivalent.)

If we really wanted to, we could investigate the agent that only accepts utility proofs that don’t go causally backwards. (Or rather, it requires that its action provably causes the utility.)

You claimed this reasoning is unwise in chess. Can you give a simple example illustrating this?
- abramdemski Oct 2, 2019, 8:35 PM
  LW: 2 AF: 1
  AF Parent
  Correct. I am trying to pin down exactly what you mean by an agent controlling a logical statement. To that end, I ask whether an agent that takes an action iff a statement is true controls the statement through choosing whether to take the action. (“The Killing Curse doesn’t crack your soul. It just takes a cracked soul to cast.”)
  The point here is that the agent described is acting like EDT is supposed to—it is checking whether its action implies X. If yes, it is acting as if it controls X in the sense that it is deciding which action to take using those implications. I’m not arguing at all that we should think “implies X” is causal, nor even that the agent has opinions on the matter; only that the agent seems to be doing something wrong, and one way of analyzing what it is doing wrong is to take a CDT stance and say “the agent is behaving as if it controls X”—in the same way that CDT says to EDT “you are behaving as if correlation implies causation” even though EDT would not assent to this interpretation of its decision.
  If we really wanted to, we could investigate the agent that only accepts utility proofs that don’t go causally backwards. (Or rather, it requires that its action provably causes the utility.)
  You claimed this reasoning is unwise in chess. Can you give a simple example illustrating this?
  I think you have me the wrong way around; I was suggesting that certain causally-backwards reasoning would be unwise in chess, not the reverse. In particular, I was suggesting that we should not judge a move poor because we think the move is something only a poor player would do, but always the other way around. For example, suppose we have a prior on moves which suggests that moving a queen into danger is something only a poor player would do. Further suppose we are in a position to move our queen into danger in a way which forces checkmate in 4 moves. I’m saying that if we reason “I could move my queen into danger to open up a path to checkmate in 4. However, only poor players move their queen into danger. Poor players would not successfully navigate a checkmate-in-4. Therefore, if I move my queen into danger, I expect to make a mistake costing me the checkmate in 4. Therefore, I will not move my queen into danger.” That’s an example of the mistake I was pointing at.
  Note: I do not personally endorse this as an argument for CDT! I am expressing these arguments because it is part of the significance of Troll Bridge. I think these arguments are the kinds of things one should grapple with if one is grappling with Troll Bridge. I have defended EDT from these kinds of critiques extensively elsewhere. My defenses do not work against Troll Bridge, but they do work against the chess example. But I’m not going into those defenses here because it would distract from the points relevant to Troll Bridge.
  What links here?
  - abramdemski's comment on Troll Bridge by abramdemski (Oct 2, 2019, 8:42 PM; 6 points)
  - abramdemski's comment on Troll Bridge by abramdemski (Oct 2, 2019, 9:04 PM; 2 points)
  - Gurkenglas Oct 3, 2019, 12:20 PM
    LW: 1 AF: 1
    AF Parent
    If I’m a poor enough player that I merely have evidence, not proof, that the queen move mates in four, then the heuristic that queen sacrifices usually don’t work out is fine and I might use it in real life. If I can prove that queen sacrifices don’t work out, the reasoning is fine even for a proof-requiring agent. Can you give a chesslike game where some proof-requiring agent can prove from the rules and perhaps the player source codes that queen sacrifices don’t work out, and therefore scores worse than some other agent would have? (Perhaps through mechanisms as in Troll bridge.)
    - abramdemski Oct 4, 2019, 11:45 PM
      LW: 2 AF: 1
      AF Parent
      The heuristic can override mere evidence, agreed. The problem I’m pointing at isn’t that the heuristic is fundamentally bad and shouldn’t be used, but rather that it shouldn’t circularly reinforce its own conclusion by counting a hypothesized move as differentially suggesting you’re a bad player in the hypothetical where you make that move. Thinking that way seems contrary to the spirit of the hypothetical (whose purpose is to help evaluate the move). It’s fine for the heuristic to suggest things are bad in that hypothetical (because you heuristically think the move is bad); it seems much more questionable to suppose that your subsequent moves will be worse in that hypothetical, particularly if that inference is a lynchpin if your overall negative assessment of the move.
      
      What do you want out of the chess-like example? Is it enough for me to say the troll could be the other player, and the bridge could be a strategy which you want to employ? (The other player defeats the strategy if they think you did it for a dumb reason, and they let it work if they think you did it smartly, and they know you well, but you don’t know whether they think you’re dumb, but you do know that if you were being dumb then you would use the strategy.) This is can be exactly troll bridge as stated in the post, but set in chess with player source code visible.
      
      I’m guessing that’s not what you want, but I’m not sure what you want.
      - Gurkenglas Oct 5, 2019, 1:00 PM
        LW: 1 AF: 1
        AF Parent
        I started asking for a chess example because you implied that the reasoning in the top-level comment stops being sane in iterated games.
        
        In a simple iteration of Troll bridge, whether we’re dumb is clear after the first time we cross the bridge. In a simple variation, the troll requires smartness even given past observations. In either case, the best worst-case utility bound requires never to cross the bridge, and A knows crossing blows A up. You seemed to expect more.
        
        Suppose my chess skill varies by day. If my last few moves were dumb, I shouldn’t rely on my skill today. I don’t see why I shouldn’t deduce this ahead of time and, until I know I’m smart today, be extra careful around moves that to dumb players look extra good and are extra bad.
        
        More concretely: Suppose that an unknown weighting of three subroutines approval-votes on my move: Timmy likes moving big pieces, Johnny likes playing good chess, and Spike tries to win in this meta. Suppose we start with move A, B or C available. A and B lead to a Johnny gambit that Timmy would ruin. Johnny thinks “If I play alone, A and B lead to 80% win probability and C to 75%. I approve exactly A and B.”. Timmy gives 0, 0.2 and 1 of his maximum vote to A, B and C. Spike wants the gambit to happen iff Spike and Johnny can outvote Timmy. Spike wants to vote for A and against B. How hard Spike votes for C trades off between his test’s false positive and false negative rates. If B wins, ruin is likely. Spike’s reasoning seems to require those hypothetical skill updates you don’t like.
        
        abramdemski Oct 5, 2019, 9:09 PM
        LW: 2 AF: 1
        AF Parent
        I started asking for a chess example because you implied that the reasoning in the top-level comment stops being sane in iterated games.
        In a simple iteration of Troll bridge, whether we’re dumb is clear after the first time we cross the bridge.
        Right, OK. I would say “sequential” rather than “iterated”—my point was about making a weird assessment of your own future behavior, not what you can do if you face the same scenario repeatedly. IE: Troll Bridge might be seen as artificial in that the environment is explicitly designed to punish you if you’re “dumb”; but, perhaps a sequential game can punish you more naturally by virtue of poor future choices.
        Suppose my chess skill varies by day. If my last few moves were dumb, I shouldn’t rely on my skill today. I don’t see why I shouldn’t deduce this ahead of time
        Yep, I agree with this.
        I concede the following points:
        If there is a mistake in the troll-bridge reasoning, predicting that your next actions are likely to be dumb conditional on a dumb-looking action is not an example of the mistake.
        Furthermore, that inference makes perfect sense, and if it is as analogous to the troll-bridge reasoning as I was previously suggesting, the troll-bridge reasoning makes sense.
        However, I still assert the following:
        Predicting that your next actions are likely to be dumb conditional on a dumb looking action doesn’t make sense if the very reason why you think the action looks dumb is that the next actions are probably dumb if you take it.
        IE, you don’t have a prior heuristic judgement that a move is one which you make when you’re dumb; rather, you’ve circularly concluded that the move would be dumb—because it’s likely to lead to a bad outcome—because if you take that move your subsequent moves are likely to be bad—because it is a dumb move.
        I don’t have a natural setup which would lead to this, but the point is that it’s a crazy way to reason rather than a natural one.
        The question, then, is whether the troll-bridge reasoning is analogous to to this.
        I think we should probably focus on the probabilistic case (recently added to the OP), rather than the proof-based agent. I could see myself deciding that the proof-based agent is more analogous to the sane case than the crazy one. But the probabilistic case seems completely wrong.
        In the proof-based case, the question is: do we see the Löbian proof as “circular” in a bad way? It makes sense to conclude that you’d only cross the bridge when it is bad to do so, if you can see that proving it’s a good idea is inconsistent. But does the proof that that’s inconsistent “go through” that very inference? We know that the troll blows up the bridge if we’re dumb, but that in itself doesn’t constitute outside reason that crossing is dumb.
        But I can see an argument that our “outside reason” is that we can’t know that crossing is safe, and since we’re a proof-based agent, would never take the risk unless we’re being dumb.
        However, this reasoning does not apply to the probabilistic agent. It can cross the bridge as a calculated risk. So its reasoning seems absolutely circular. There is no “prior reason” for it to think crossing is dumb; and, even if it did think it more likely dumb than not, it doesn’t seem like it should be 100% certain of that. There should be some utilities for the three outcomes which preserve the preference ordering but which make the risk of crossing worthwhile.