Short answer: use differential entropy and differential mutual information.
Differential entropy and Shannon entropy are both instances of a more general concept; Shannon entropy for discrete distributions and differential entropy for absolutely continuous ones.
KL-divergence is actually more about approximations, than about variables dependence; it would be strange to use KL-divergence for your purposes, since it is non-symmetric. Anyway, KL-divergence is tightly connected to mutual information:
%20=%20KL(p(x,y)%7C%7Cp(x)p(y)))
Differential mutual information is a measure of variables’ dependence; differential entropy is a measure of… what? In the discrete case we have the encoding interpretation, but it breaks in the continuous case. The fact that it can be negative shouldn’t bother you, because its interpretation is unclear anyway.
As for (differential) mutual information, it can’t be negative, as you can see from the formula above (KL-divergence is non-negative). Nothing weird occurs here.
Short answer: use differential entropy and differential mutual information.
Differential entropy and Shannon entropy are both instances of a more general concept; Shannon entropy for discrete distributions and differential entropy for absolutely continuous ones.
KL-divergence is actually more about approximations, than about variables dependence; it would be strange to use KL-divergence for your purposes, since it is non-symmetric. Anyway, KL-divergence is tightly connected to mutual information:
%20=%20KL(p(x,y)%7C%7Cp(x)p(y)))Differential mutual information is a measure of variables’ dependence; differential entropy is a measure of… what? In the discrete case we have the encoding interpretation, but it breaks in the continuous case. The fact that it can be negative shouldn’t bother you, because its interpretation is unclear anyway.
As for (differential) mutual information, it can’t be negative, as you can see from the formula above (KL-divergence is non-negative). Nothing weird occurs here.
Brilliant! Thank you so much!