I’ve always understood it as an analogy (e.g. from the debate paper, “Although debate is intended for use with fuzzy humans as judges, we can gain intuition about the model by replacing the human with an arbitrary polynomial time algorithm”). But if I take your perspective, then we have:
Premise 0: Humans can solve all problems in P.
Premise 1: We can’t build physical machines that solve problems outside of P.
Conclusion: We can’t build physical machines that can solve problems that humans can’t.
Seems like you should probably ditch one of the premises.
(I do think premise 0 is pretty wrong—I can solve arbitrary SAT problems that have < 10 clauses, despite it being NP-complete, but I can’t multiply a trillion numbers together, even though that is in P. Complexity classes impose scaling limits; humans have resource limits; these are very different. Yes, I do know that “SAT problems with < 10 clauses” is technically in P.)
Also, regarding your version of premise 1, I think I buy that AI can only give you a polynomial speedup over humans.
I think this is easily handled by saying “in practice, the models we train will not be literally optimal”. It’s still useful to know what would happen at optimality (e.g. I do in fact have more trust for deep RL algorithms that have proofs of convergence to optimal policies given infinite model capacity, compute and data).
I read the post as attempting to be literal, ctrl+F-ing “analog” doesn’t get me anything until the comments. Also, the post is the one that I read as assuming for the sake of analysis that humans can solve all problems in P, I myself wouldn’t necessarily assume that.
Also, regarding your version of premise 1, I think I buy that AI can only give you a polynomial speedup over humans.
I think this is easily handled by saying “in practice, the models we train will not be literally optimal”.
My guess is that the thing you mean is something like “Sure, the conclusion of the post is that optimal models can do more than a polynomial speedup over humans, but we won’t train optimal models, and in fact the things we train will just be a polynomial speedup over humans”, which is compatible with my argument in the top-level comment. But in general your comments make me think that you’re interpreting the post completely differently than I am. [EDIT: actually now that I re-read the second half of your comment it makes sense to me. I still am confused about what you think this post means.]
I mean, Evan can speak to what he meant. I strongly disagree with the literal interpretation of the post, regardless of what Evan meant, for the reason I gave above.
I still am confused about what you think this post means
I usually think of the analogies as demonstrating the mechanism by which a particular scheme is meant to outperform a human. I do find these proof-analogies less compelling than the original one in debate because they’re simulating Turing machines which is not how I expect any of them to work in practice. (In contrast, the debate proof for accessing PSPACE was about narrowing in on disagreements as being equivalent to finding a winning path of a two-player game, which is the canonical “structure” of a PSPACE-complete problem and is similar to why we’d expect debate to incentivize honesty in practice.)
I read the post as attempting to be literal, ctrl+F-ing “analog” doesn’t get me anything until the comments. Also, the post is the one that I read as assuming for the sake of analysis that humans can solve all problems in P, I myself wouldn’t necessarily assume that.
I mean, I assumed what I needed to in order to be able to do the proofs and have them make sense. What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
I’m curious if you agree with the inference of conclusions 1 and 2 from premises 1, 2, and 3, and/or the underlying claim that it’s bad news to learn that your alignment scheme would be able to solve a very large complexity class.
I agree with the gist that it implies that arguments about the equilibrium policy don’t necessarily translate to real models, though I disagree that that’s necessarily bad news for the alignment scheme—it just means you need to find some guarantees that work even when you’re not at equilibrium.
I’ve always understood it as an analogy (e.g. from the debate paper, “Although debate is intended for use with fuzzy humans as judges, we can gain intuition about the model by replacing the human with an arbitrary polynomial time algorithm”). But if I take your perspective, then we have:
Premise 0: Humans can solve all problems in P.
Premise 1: We can’t build physical machines that solve problems outside of P.
Conclusion: We can’t build physical machines that can solve problems that humans can’t.
Seems like you should probably ditch one of the premises.
(I do think premise 0 is pretty wrong—I can solve arbitrary SAT problems that have < 10 clauses, despite it being NP-complete, but I can’t multiply a trillion numbers together, even though that is in P. Complexity classes impose scaling limits; humans have resource limits; these are very different. Yes, I do know that “SAT problems with < 10 clauses” is technically in P.)
I think this is easily handled by saying “in practice, the models we train will not be literally optimal”. It’s still useful to know what would happen at optimality (e.g. I do in fact have more trust for deep RL algorithms that have proofs of convergence to optimal policies given infinite model capacity, compute and data).
I read the post as attempting to be literal, ctrl+F-ing “analog” doesn’t get me anything until the comments. Also, the post is the one that I read as assuming for the sake of analysis that humans can solve all problems in P, I myself wouldn’t necessarily assume that.
My guess is that the thing you mean is something like “Sure, the conclusion of the post is that optimal models can do more than a polynomial speedup over humans, but we won’t train optimal models, and in fact the things we train will just be a polynomial speedup over humans”, which is compatible with my argument in the top-level comment. But in general your comments make me think that you’re interpreting the post completely differently than I am. [EDIT: actually now that I re-read the second half of your comment it makes sense to me. I still am confused about what you think this post means.]
I mean, Evan can speak to what he meant. I strongly disagree with the literal interpretation of the post, regardless of what Evan meant, for the reason I gave above.
I usually think of the analogies as demonstrating the mechanism by which a particular scheme is meant to outperform a human. I do find these proof-analogies less compelling than the original one in debate because they’re simulating Turing machines which is not how I expect any of them to work in practice. (In contrast, the debate proof for accessing PSPACE was about narrowing in on disagreements as being equivalent to finding a winning path of a two-player game, which is the canonical “structure” of a PSPACE-complete problem and is similar to why we’d expect debate to incentivize honesty in practice.)
I mean, I assumed what I needed to in order to be able to do the proofs and have them make sense. What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
I’m curious if you agree with the inference of conclusions 1 and 2 from premises 1, 2, and 3, and/or the underlying claim that it’s bad news to learn that your alignment scheme would be able to solve a very large complexity class.
I agree with the gist that it implies that arguments about the equilibrium policy don’t necessarily translate to real models, though I disagree that that’s necessarily bad news for the alignment scheme—it just means you need to find some guarantees that work even when you’re not at equilibrium.