Correlation space is between −1 and 1, with 1 being the same (definitely true), −1 being the opposite (definitely false), and 0 being orthogonal (very uncertain). I had the idea that you could assume maximum uncertainty to be 0 in correlation space, and 1/n (the uniform distribution) in probability space.
Not sure what you mean here, but p×2−1 would linearly transform a probability p from [0..1] to [-1..1]. You could likewise transform a correlation coefficient ϕ to [0..1] with ϕ(A,B)+12. For P(A)=P(B)=12, this would correspond to the probability of A occuring if and only if B occurs. I.e.ϕ(A,B)+12=P(A↔B) when P(A)=P(B)=0.5.
So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 − 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.
As far as I know, this is unpublished in the literature. It’s a pretty obscure use case, so that’s not surprising. I have doubts I’ll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn’t matter much if I show it here.
Correlation space is between −1 and 1, with 1 being the same (definitely true), −1 being the opposite (definitely false), and 0 being orthogonal (very uncertain). I had the idea that you could assume maximum uncertainty to be 0 in correlation space, and 1/n (the uniform distribution) in probability space.
Not sure what you mean here, but p×2−1 would linearly transform a probability p from [0..1] to [-1..1]. You could likewise transform a correlation coefficient ϕ to [0..1] with ϕ(A,B)+12. For P(A)=P(B)=12, this would correspond to the probability of A occuring if and only if B occurs. I.e.ϕ(A,B)+12=P(A↔B) when P(A)=P(B)=0.5.
So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 − 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.
What’s the solution?
p = (n^c * (c + 1)) / (2^c * n)
As far as I know, this is unpublished in the literature. It’s a pretty obscure use case, so that’s not surprising. I have doubts I’ll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn’t matter much if I show it here.