DanielFilan comments on Alignment proposals and complexity classes

DanielFilan 17 Jul 2020 1:08 UTC
LW: 6 AF: 3
AF
I read the post as attempting to be literal, ctrl+F-ing “analog” doesn’t get me anything until the comments. Also, the post is the one that I read as assuming for the sake of analysis that humans can solve all problems in P, I myself wouldn’t necessarily assume that.

Also, regarding your version of premise 1, I think I buy that AI can only give you a polynomial speedup over humans.

I think this is easily handled by saying “in practice, the models we train will not be literally optimal”.

My guess is that the thing you mean is something like “Sure, the conclusion of the post is that optimal models can do more than a polynomial speedup over humans, but we won’t train optimal models, and in fact the things we train will just be a polynomial speedup over humans”, which is compatible with my argument in the top-level comment. But in general your comments make me think that you’re interpreting the post completely differently than I am. [EDIT: actually now that I re-read the second half of your comment it makes sense to me. I still am confused about what you think this post means.]
- Rohin Shah 17 Jul 2020 2:47 UTC
  LW: 6 AF: 4
  AF Parent
  I read the post as attempting to be literal
  I mean, Evan can speak to what he meant. I strongly disagree with the literal interpretation of the post, regardless of what Evan meant, for the reason I gave above.
  I still am confused about what you think this post means
  I usually think of the analogies as demonstrating the mechanism by which a particular scheme is meant to outperform a human. I do find these proof-analogies less compelling than the original one in debate because they’re simulating Turing machines which is not how I expect any of them to work in practice. (In contrast, the debate proof for accessing PSPACE was about narrowing in on disagreements as being equivalent to finding a winning path of a two-player game, which is the canonical “structure” of a PSPACE-complete problem and is similar to why we’d expect debate to incentivize honesty in practice.)
- evhub 17 Jul 2020 1:20 UTC
  LW: 4 AF: 2
  AF Parent
  
  I read the post as attempting to be literal, ctrl+F-ing “analog” doesn’t get me anything until the comments. Also, the post is the one that I read as assuming for the sake of analysis that humans can solve all problems in P, I myself wouldn’t necessarily assume that.
  
  I mean, I assumed what I needed to in order to be able to do the proofs and have them make sense. What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
  - DanielFilan 17 Jul 2020 3:00 UTC
    LW: 9 AF: 4
    AF Parent
    
    What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
    
    I’m curious if you agree with the inference of conclusions 1 and 2 from premises 1, 2, and 3, and/or the underlying claim that it’s bad news to learn that your alignment scheme would be able to solve a very large complexity class.
    - evhub 17 Jul 2020 4:36 UTC
      LW: 9 AF: 4
      AF Parent
      I agree with the gist that it implies that arguments about the equilibrium policy don’t necessarily translate to real models, though I disagree that that’s necessarily bad news for the alignment scheme—it just means you need to find some guarantees that work even when you’re not at equilibrium.