This is a fact worth knowing and a lucid explanation—thanks for writing this!
I know it’s not the main point of the post, but I found myself a little lost when you talked about complexity theory; I would be interested to hear more details. In particular:
When you say
what are the definitions of these classes? Are these decision problems in the usual sense, or is this a statement about learning theory? I haven’t been able to either track down a definition of e.g. “NP-invertible” or invent one that would make sense—if this is a renamed-version of something people have studied I would be curious to know the original name.
You claim “the assumption PBPP implies that there is no way, in polynomial time, to get reliable information about A’s secret circuit C (beyond some statistical regularities)”—I suspect this is not quite what you mean. I’m not sure exactly the formal statement you’re making here, but it would be pretty unusual for PBPP to be a relevant assumption (in fact most people in TCS strongly believe P=BPP). You call this “the probablistic version [of PNP]”, so I’m guessing this was a typo and you mean NP ⊄ BPP? But I would also be impressed if you could get something like this statement from only NP ⊄ BPP. My best guess would be that you do need those cryptographic assumptions you mentioned for this part (that is, if you want to say “P/poly is not PAC learnable”—in other words, no polynomial-time algorithm can, given random input-output samples from an arbitrary poly-size circuit, find a circuit computing a nearby function except with tiny success probability—I’m pretty sure this is only known conditional on the existence of one-way functions).
Again though, I know these are quibbles and the main point of this section is “you don’t need any unproven complexity assumptions to get lower bounds against statistical query learning”.
(Separately, a minor nit with the presentation: I found the decision to use “learning algorithm” to refer specifically to “weight-based ML architecture that learns via SGD” to be slightly confusing. The takeaway being “there exists an algorithm that learns parities, but there doesn’t exist a learning algorithm that learns parities” is a little slippery—I think it would be worth replacing “learning algorithm” with e.g. “gradient-based learning algorithm” in the writeup.)
Confirming that efficiently finding a small circuit (you don’t actually need further restrictions than size) based on its values on a fixed collection of test inputs is known to imply NP⊆BPP --- see this paper.