evhub comments on Alignment proposals and complexity classes

evhub 16 Jul 2020 21:11 UTC
LW: 2 AF: 1
AF
Glad you found the post exciting!

I follow up to there. What you’re showing is basically that you can train M to solve any problem in C using a specific alignment approach, without limiting in any way the computational power of M. So it might take an untractable amount of resources, like exponential time, for this model to solve a problem in PSPACE, but what matters here is just that it does. The point is to show that alignment approaches using a polynomial-time human can solve these problems, not how much resources they will use to do so.

Yep—that’s right.

Maybe it’s just grammar, but I read this sentence as saying that I trust the output of the polynomial-time strategy. And thus that you can solve PSPACE, NEXP, EXP and R problems in polynomial time. So I’m assuming that you mean trusting the model, which once again has no limits in terms of resources used.

I certainly don’t mean that the model needs to be polynomial time. I edited the post to try to clarify this point.

I looked for that statement in the paper, failed to find it, then realized you probably meant that raw polynomial time verification gives you NP (the certificate version of NP, basically). Riffing on the importance of optimal play, Irving et al. show that the debate game is a game in the complexity theoretic sense, and thus that it is equivalent to TQBF, a PSPACE-complete problem. But when seeing a closed formula as a game, the decision problem of finding whether it’s in TQBF amounts to showing the existence of a winning strategy for the first player. Debate solves this by assuming optimal play, and thus that the winning debater will have, find, and apply a winning strategy for the debate.

Yeah—that’s my interpretation of the debate paper as well.