DanielFilan comments on Alignment proposals and complexity classes

DanielFilan 16 Jul 2020 22:47 UTC
LW: 9 AF: 5
AF
I’m confused by this comment. The post assumes that humans can solve all problems in P (in fact, all problems solvable in polynomial time given access to an oracle for M), then proves that various protocols can solve tricky problems by getting the human player to solve problems in P that in reality aren’t typical human problems to solve. Therefore, I take P to actually mean P, rather than being an analogy for problems humans can solve.

Also, regarding your version of premise 1, I think I buy that AI can only give you a polynomial speedup over humans.
- Rohin Shah 16 Jul 2020 23:25 UTC
  LW: 5 AF: 3
  AF Parent
  I’ve always understood it as an analogy (e.g. from the debate paper, “Although debate is intended for use with fuzzy humans as judges, we can gain intuition about the model by replacing the human with an arbitrary polynomial time algorithm”). But if I take your perspective, then we have:
  Premise 0: Humans can solve all problems in P.
  Premise 1: We can’t build physical machines that solve problems outside of P.
  Conclusion: We can’t build physical machines that can solve problems that humans can’t.
  Seems like you should probably ditch one of the premises.
  (I do think premise 0 is pretty wrong—I can solve arbitrary SAT problems that have < 10 clauses, despite it being NP-complete, but I can’t multiply a trillion numbers together, even though that is in P. Complexity classes impose scaling limits; humans have resource limits; these are very different. Yes, I do know that “SAT problems with < 10 clauses” is technically in P.)
  Also, regarding your version of premise 1, I think I buy that AI can only give you a polynomial speedup over humans.
  I think this is easily handled by saying “in practice, the models we train will not be literally optimal”. It’s still useful to know what would happen at optimality (e.g. I do in fact have more trust for deep RL algorithms that have proofs of convergence to optimal policies given infinite model capacity, compute and data).
  - DanielFilan 17 Jul 2020 1:08 UTC
    LW: 6 AF: 3
    AF Parent
    I read the post as attempting to be literal, ctrl+F-ing “analog” doesn’t get me anything until the comments. Also, the post is the one that I read as assuming for the sake of analysis that humans can solve all problems in P, I myself wouldn’t necessarily assume that.
    
    Also, regarding your version of premise 1, I think I buy that AI can only give you a polynomial speedup over humans.
    
    I think this is easily handled by saying “in practice, the models we train will not be literally optimal”.
    
    My guess is that the thing you mean is something like “Sure, the conclusion of the post is that optimal models can do more than a polynomial speedup over humans, but we won’t train optimal models, and in fact the things we train will just be a polynomial speedup over humans”, which is compatible with my argument in the top-level comment. But in general your comments make me think that you’re interpreting the post completely differently than I am. [EDIT: actually now that I re-read the second half of your comment it makes sense to me. I still am confused about what you think this post means.]
    - Rohin Shah 17 Jul 2020 2:47 UTC
      LW: 6 AF: 4
      AF Parent
      I read the post as attempting to be literal
      I mean, Evan can speak to what he meant. I strongly disagree with the literal interpretation of the post, regardless of what Evan meant, for the reason I gave above.
      I still am confused about what you think this post means
      I usually think of the analogies as demonstrating the mechanism by which a particular scheme is meant to outperform a human. I do find these proof-analogies less compelling than the original one in debate because they’re simulating Turing machines which is not how I expect any of them to work in practice. (In contrast, the debate proof for accessing PSPACE was about narrowing in on disagreements as being equivalent to finding a winning path of a two-player game, which is the canonical “structure” of a PSPACE-complete problem and is similar to why we’d expect debate to incentivize honesty in practice.)
    - evhub 17 Jul 2020 1:20 UTC
      LW: 4 AF: 2
      AF Parent
      
      I read the post as attempting to be literal, ctrl+F-ing “analog” doesn’t get me anything until the comments. Also, the post is the one that I read as assuming for the sake of analysis that humans can solve all problems in P, I myself wouldn’t necessarily assume that.
      
      I mean, I assumed what I needed to in order to be able to do the proofs and have them make sense. What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
      - DanielFilan 17 Jul 2020 3:00 UTC
        LW: 9 AF: 4
        AF Parent
        
        What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
        
        I’m curious if you agree with the inference of conclusions 1 and 2 from premises 1, 2, and 3, and/or the underlying claim that it’s bad news to learn that your alignment scheme would be able to solve a very large complexity class.
        
        evhub 17 Jul 2020 4:36 UTC
        LW: 9 AF: 4
        AF Parent
        I agree with the gist that it implies that arguments about the equilibrium policy don’t necessarily translate to real models, though I disagree that that’s necessarily bad news for the alignment scheme—it just means you need to find some guarantees that work even when you’re not at equilibrium.
- adamShimi 16 Jul 2020 23:06 UTC
  LW: 3 AF: 2
  AF Parent
  The post assumes that humans can solve all problems in P (in fact, all problems solvable in polynomial time given access to an oracle for M), then proves that various protocols can solve tricky problems by getting the human player to solve problems in P that in reality aren’t typical human problems to solve.
  I don’t see where the post says that humans can solve all problems solvable in polynomial time given access to an oracle. What Evan does is just replacing humans (which is a rather fuzzy category) with polynomial time algorithms (which represent the consensus in complexity theory for tractable algorithms).
  Premise 1: We can’t build physical machines that solve problems outside of P.
  On another note, you’re writing your comment on a physical device that can solve any computable problem, which obviously include problems outside of P (if only by the time hierarchy theorem). The value of P is that it captures problem that one can solve using physical machines in a way that scale reasonably, so you don’t need to take the lifespan of the universe to compute the answer for an instance of size 1000. But we clearly have algorithms for problems outside of P. We just believe/know they will take forever and/or too much space and/or too much energy.
  - DanielFilan 16 Jul 2020 23:09 UTC
    LW: 4 AF: 2
    AF Parent
    TBC by “can” I mean “can in practice”. Also if we’re getting picky my laptop is just a finite state machine :)
  - DanielFilan 16 Jul 2020 23:10 UTC
    LW: 2 AF: 1
    AF Parent
    I don’t see where the post says that humans can solve all problems solvable in polynomial time given access to an oracle.
    From the post:
    First, we’ll assume that a human, $H$ , is polynomial-time such that $H$ can reliably solve any problem in $P$
    - adamShimi 16 Jul 2020 23:17 UTC
      LW: 3 AF: 2
      AF Parent
      Yes, this says that humans can solve any problem in P. It says nothing about using M as an oracle.
      - DanielFilan 17 Jul 2020 1:00 UTC
        LW: 8 AF: 4
        AF Parent
        It doesn’t talk about using M as an oracle, but otherwise I don’t see how the proofs pan out: for instance, how else is
        
        Given $p$ , return $M (p : s_{0} : [] : x_{0} : [])$ where $s_{0} \in S_{p}$ is the initial state of $T_{p}$ , $x_{0}$ is the empty tape symbol, and $:$ is string concatenation.
        
        supposed to only take polynomial time?
        adamShimi 17 Jul 2020 21:04 UTC
        LW: 3 AF: 2
        AF Parent
        You’re right, after rereading the proofs and talking with Evan, the only way for H to be polynomial time is to get oracle access to M. Which is slightly unintuitive, but makes sense because the part of the computation that depends on H and not on M is indeed polynomial time.