Here’s an example to show where Pearson’s approach goes wrong:
A Q comes in two types: an X or a Y.
An X comes in two types: A1 or B2.
A Y comes in two types: A2 or B1.
You need to reveal the A/B- and 1/2-data for your Qs, while keeping a secret whether the Q is an X or a Y.
The A vs. B and 1 vs. 2 properties of a Q are uncorrelated with whether it’s an X or a Y. You can reveal one or the other, but not both, while keeping the secret. If you reveal both, even though neither piece of information is correlated with the secret identity, you reveal the secret.
What you are describing is data (A/B, 1⁄2) such that parts of the data are independent from the secret X/Y, but the whole data is not independent from the secret. That’s an issue that is sort of unusual for any statistical approach, because it should be clear that only the whole leaked data should be considered.
The problem with Pearson correlation criterion is that it does not measure independence at all (even for parts of the data), but measures correlation which is just a single statistic of the two variables. It’s as if you compared two distributions by comparing their means.
Let’s say leaked data is X = −2, −1, 1, 2 equiprobably, and secret data is Y = X^2. Zero correlation just implies E(XY) - E(X)E(Y) = 0, which is the case, but it is clear that one can fully restore the secret from the leaked, they are not independent at all.
Here’s an example to show where Pearson’s approach goes wrong:
A Q comes in two types: an X or a Y.
An X comes in two types: A1 or B2.
A Y comes in two types: A2 or B1.
You need to reveal the A/B- and 1/2-data for your Qs, while keeping a secret whether the Q is an X or a Y.
The A vs. B and 1 vs. 2 properties of a Q are uncorrelated with whether it’s an X or a Y. You can reveal one or the other, but not both, while keeping the secret. If you reveal both, even though neither piece of information is correlated with the secret identity, you reveal the secret.
No, that’s not what’s wrong with Pearson’s approach. Your example suffers from a different issue.
Can you give an example to explain? It’s the best example I could give based on the description in the OP.
What you are describing is data (A/B, 1⁄2) such that parts of the data are independent from the secret X/Y, but the whole data is not independent from the secret. That’s an issue that is sort of unusual for any statistical approach, because it should be clear that only the whole leaked data should be considered.
The problem with Pearson correlation criterion is that it does not measure independence at all (even for parts of the data), but measures correlation which is just a single statistic of the two variables. It’s as if you compared two distributions by comparing their means.
Let’s say leaked data is X = −2, −1, 1, 2 equiprobably, and secret data is Y = X^2. Zero correlation just implies E(XY) - E(X)E(Y) = 0, which is the case, but it is clear that one can fully restore the secret from the leaked, they are not independent at all.
See more at https://en.wikipedia.org/wiki/Correlation_and_dependence#Correlation_and_independence