Not sure what you mean here, but p×2−1 would linearly transform a probability p from [0..1] to [-1..1]. You could likewise transform a correlation coefficient ϕ to [0..1] with ϕ(A,B)+12. For P(A)=P(B)=12, this would correspond to the probability of A occuring if and only if B occurs. I.e.ϕ(A,B)+12=P(A↔B) when P(A)=P(B)=0.5.
So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 − 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.
As far as I know, this is unpublished in the literature. It’s a pretty obscure use case, so that’s not surprising. I have doubts I’ll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn’t matter much if I show it here.
Not sure what you mean here, but p×2−1 would linearly transform a probability p from [0..1] to [-1..1]. You could likewise transform a correlation coefficient ϕ to [0..1] with ϕ(A,B)+12. For P(A)=P(B)=12, this would correspond to the probability of A occuring if and only if B occurs. I.e.ϕ(A,B)+12=P(A↔B) when P(A)=P(B)=0.5.
So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 − 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.
What’s the solution?
p = (n^c * (c + 1)) / (2^c * n)
As far as I know, this is unpublished in the literature. It’s a pretty obscure use case, so that’s not surprising. I have doubts I’ll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn’t matter much if I show it here.