This is a fact worth knowing and a lucid explanation—thanks for writing this!
I know it’s not the main point of the post, but I found myself a little lost when you talked about complexity theory; I would be interested to hear more details. In particular:
what are the definitions of these classes? Are these decision problems in the usual sense, or is this a statement about learning theory? I haven’t been able to either track down a definition of e.g. “NP-invertible” or invent one that would make sense—if this is a renamed-version of something people have studied I would be curious to know the original name.
You claim “the assumption P≠BPP implies that there is no way, in polynomial time, to get reliable information about A’s secret circuit C (beyond some statistical regularities)”—I suspect this is not quite what you mean. I’m not sure exactly the formal statement you’re making here, but it would be pretty unusual for P≠BPP to be a relevant assumption (in fact most people in TCS strongly believe P=BPP). You call this “the probablistic version [of P≠NP]”, so I’m guessing this was a typo and you mean NP ⊄ BPP? But I would also be impressed if you could get something like this statement from only NP ⊄ BPP. My best guess would be that you do need those cryptographic assumptions you mentioned for this part (that is, if you want to say “P/poly is not PAC learnable”—in other words, no polynomial-time algorithm can, given random input-output samples from an arbitrary poly-size circuit, find a circuit computing a nearby function except with tiny success probability—I’m pretty sure this is only known conditional on the existence of one-way functions).
Again though, I know these are quibbles and the main point of this section is “you don’t need any unproven complexity assumptions to get lower bounds against statistical query learning”.
(Separately, a minor nit with the presentation: I found the decision to use “learning algorithm” to refer specifically to “weight-based ML architecture that learns via SGD” to be slightly confusing. The takeaway being “there exists an algorithm that learns parities, but there doesn’t exist a learning algorithm that learns parities” is a little slippery—I think it would be worth replacing “learning algorithm” with e.g. “gradient-based learning algorithm” in the writeup.)
This is a fact worth knowing and a lucid explanation—thanks for writing this!
I know it’s not the main point of the post, but I found myself a little lost when you talked about complexity theory; I would be interested to hear more details. In particular:
BPP-invertible⊂NP-invertibleLearning algorithms⊂BPP-invertible.When you say
what are the definitions of these classes? Are these decision problems in the usual sense, or is this a statement about learning theory? I haven’t been able to either track down a definition of e.g. “NP-invertible” or invent one that would make sense—if this is a renamed-version of something people have studied I would be curious to know the original name.
You claim “the assumption P≠BPP implies that there is no way, in polynomial time, to get reliable information about A’s secret circuit C (beyond some statistical regularities)”—I suspect this is not quite what you mean. I’m not sure exactly the formal statement you’re making here, but it would be pretty unusual for P≠BPP to be a relevant assumption (in fact most people in TCS strongly believe P=BPP). You call this “the probablistic version [of P≠NP]”, so I’m guessing this was a typo and you mean NP ⊄ BPP? But I would also be impressed if you could get something like this statement from only NP ⊄ BPP. My best guess would be that you do need those cryptographic assumptions you mentioned for this part (that is, if you want to say “P/poly is not PAC learnable”—in other words, no polynomial-time algorithm can, given random input-output samples from an arbitrary poly-size circuit, find a circuit computing a nearby function except with tiny success probability—I’m pretty sure this is only known conditional on the existence of one-way functions).
Again though, I know these are quibbles and the main point of this section is “you don’t need any unproven complexity assumptions to get lower bounds against statistical query learning”.
(Separately, a minor nit with the presentation: I found the decision to use “learning algorithm” to refer specifically to “weight-based ML architecture that learns via SGD” to be slightly confusing. The takeaway being “there exists an algorithm that learns parities, but there doesn’t exist a learning algorithm that learns parities” is a little slippery—I think it would be worth replacing “learning algorithm” with e.g. “gradient-based learning algorithm” in the writeup.)