Is the complaint that you can’t do predicate calculus on the probabilities? Because I can certainly use predicate calculus all I want on the expressions within the probabilities.
And if that is the complaint, then my question is: why do we want to do predicate calculus on the probabilities? Like, what would be one concrete application in which we’d want to do that? (Self-reference and things in that cluster would be the obvious use-case, I’m mostly curious if there’s any other use-case.)
Imagine, you have a function f that takes a_1, a_2, …, a_n and returns b_1, b_2, … b_m. a_1, a_2, …, a_n are boolean states of the known world and b_1, b_2, … b_m boolean states of the world you don’t yet know. Because f uses predicate logic internally you can’t modify it to take values between 0 and 1 and have to accept that it can only take boolean values.
When you do your probability augmentation you can easily add probabilities to a_1, a_2, …, a_n and have P(a_1), P(a_2), …, P(a_n), as those are part of the known world.
On the other hand, how would you get P(b_1), P(b_2), … , P(b_m)?
I’m not quite understanding the example yet. Two things which sound similar, but are probably not what you mean because they’re straightforward Bayesian models:
I’m given a function f: A → B and a distribution (a↦P[A=a]) over the set A. Then I push forward the distribution on A through f to get a distribution over B.
Same as previous, but the function f is also unknown, so to do things Bayesian-ly I need to have a prior over f (more precisely, a joint prior over f and A).
How is the thing you’re saying different from those?
Or: it sounds like you’re talking about an inference problem, so what’s the inference problem? What information is given, and what are we trying to predict?
I’m talking about a function that takes a one-dimensional vector of booleans A and returns a one-dimensional vector B. The function does not accept a one-dimensional vector of real numbers between 0 and 1.
To be able to “push forward” probabilities, f would need to be defined to handle probabilities.
where I[...] is an indicator function. In terms of interpretation: this is the frequency at which I will see B take on value b, if I sample A from the distribution P[A] and then compute B via B = f(A).
What do you want to do which is not that, and why do you want to do it?
Most of the time, the data you gather about the world is that you have a bunch of facts about the world and probabilities about the individual data points and you would want as an outcome also probabilities over individual datapoints.
As far as my own background goes, I have not studied logic or the math behind the AI algorithm that David Chapman wrote. I did study bioinformatics in that that study we did talk about probabilities calculations that are done in bioinformatics, so I have some intuitions from that domain, so I take a bioinformatics example even if I don’t know exactly how to productively apply predicate calculus to the example.
If you for example get input data from gene sequencing and billions of probabilities (a_1, a_2, …, a_n) and want output data about whether or not individual genetic mutations exist (b_1, b_2, …, b_m) and not just P(B) = P(b_1) * P(b_2) * … * P(b_m).
If you have m = 100,000 in the case of possible genetic mutations, P(B) is a very small number with little robustness to error. A single bad b_x will propagate to make your total P(B) unreliable. You might have an application where getting a b_234, b_9538 and b _33889 wrong is an acceptable error because most of the values where good.
Is the complaint that you can’t do predicate calculus on the probabilities? Because I can certainly use predicate calculus all I want on the expressions within the probabilities.
And if that is the complaint, then my question is: why do we want to do predicate calculus on the probabilities? Like, what would be one concrete application in which we’d want to do that? (Self-reference and things in that cluster would be the obvious use-case, I’m mostly curious if there’s any other use-case.)
Imagine, you have a function f that takes a_1, a_2, …, a_n and returns b_1, b_2, … b_m. a_1, a_2, …, a_n are boolean states of the known world and b_1, b_2, … b_m boolean states of the world you don’t yet know. Because f uses predicate logic internally you can’t modify it to take values between 0 and 1 and have to accept that it can only take boolean values.
When you do your probability augmentation you can easily add probabilities to a_1, a_2, …, a_n and have P(a_1), P(a_2), …, P(a_n), as those are part of the known world.
On the other hand, how would you get P(b_1), P(b_2), … , P(b_m)?
I’m not quite understanding the example yet. Two things which sound similar, but are probably not what you mean because they’re straightforward Bayesian models:
I’m given a function f: A → B and a distribution (a↦P[A=a]) over the set A. Then I push forward the distribution on A through f to get a distribution over B.
Same as previous, but the function f is also unknown, so to do things Bayesian-ly I need to have a prior over f (more precisely, a joint prior over f and A).
How is the thing you’re saying different from those?
Or: it sounds like you’re talking about an inference problem, so what’s the inference problem? What information is given, and what are we trying to predict?
I’m talking about a function that takes a one-dimensional vector of booleans A and returns a one-dimensional vector B. The function does not accept a one-dimensional vector of real numbers between 0 and 1.
To be able to “push forward” probabilities, f would need to be defined to handle probabilities.
The standard push forward here would be:
P[B=b]=∑aI[f(a)=b]P[A=a]
where I[...] is an indicator function. In terms of interpretation: this is the frequency at which I will see B take on value b, if I sample A from the distribution P[A] and then compute B via B = f(A).
What do you want to do which is not that, and why do you want to do it?
Most of the time, the data you gather about the world is that you have a bunch of facts about the world and probabilities about the individual data points and you would want as an outcome also probabilities over individual datapoints.
As far as my own background goes, I have not studied logic or the math behind the AI algorithm that David Chapman wrote. I did study bioinformatics in that that study we did talk about probabilities calculations that are done in bioinformatics, so I have some intuitions from that domain, so I take a bioinformatics example even if I don’t know exactly how to productively apply predicate calculus to the example.
If you for example get input data from gene sequencing and billions of probabilities (a_1, a_2, …, a_n) and want output data about whether or not individual genetic mutations exist (b_1, b_2, …, b_m) and not just P(B) = P(b_1) * P(b_2) * … * P(b_m).
If you have m = 100,000 in the case of possible genetic mutations, P(B) is a very small number with little robustness to error. A single bad b_x will propagate to make your total P(B) unreliable. You might have an application where getting a b_234, b_9538 and b _33889 wrong is an acceptable error because most of the values where good.