like if you want it to recognize spam emails, but you only show it aspects of the emails such that there is at best a statistically weak correlation between them and whether the email is spam or not-spam
Here “statistically weak correlation” should be “not a lot of mutual information”, since correlation is only about linear dependence between random variables.
On the first – I don’t understand why correlation doesn’t apply. “Spam” is a random variable, and whatever feature you show, like “length”, is also a random variable – right?
Two nitpicks:
Here “statistically weak correlation” should be “not a lot of mutual information”, since correlation is only about linear dependence between random variables.
Should be i.i.d.
Thanks; fixed the second one.
On the first – I don’t understand why correlation doesn’t apply. “Spam” is a random variable, and whatever feature you show, like “length”, is also a random variable – right?