First, the part about using models/logics with probabilities. (This part isn’t about model comparison per se, but is necessary foundation.) (Terminological note: the thing a logician would call a “logic” or possibly a “logic augmented with some probabilities” I would instead normally call a “model” in the context of Bayesian probability, and the thing a logician would call a “model” I would instead normally call a “world” in the context of Bayesian probability; I think that’s roughly how standard usage works.) Roughly speaking: you have at least one plain old (predicate) logic, and all “random” variables are scoped to their logic, just like ordinary logic. To bring probability into the picture, the logic needs to be augmented with enough probabilities of values of variables in the logic that the rest of the probabilities can be derived. All queries involving probabilities of values of variables then need to be conditioned on a logic containing those variables, in order to be well defined.
Typical example: a Bayes net is a logic with a finite set of variables, one per node in the net, augmented with some conditional probabilities for each node such that we can derive all probabilities.
Most of the interesting questions of world modeling are then about “model comparison” (though a logician would probably rather call it “logic comparison”): we want to have multiple hypotheses about which logics-augmented-with-probabilities best predict some real-world system, and test those hypotheses statistically just like we test everything else. That’s why we need model comparison.
the thing a logician would call a “logic” or possibly a “logic augmented with some probabilities”
The main point of the article is that once you add probabilities you can’t do predicate calculus anymore. It’s a mathematical operation that’s not defined for the entities that you get when you do your augmentation.
Is the complaint that you can’t do predicate calculus on the probabilities? Because I can certainly use predicate calculus all I want on the expressions within the probabilities.
And if that is the complaint, then my question is: why do we want to do predicate calculus on the probabilities? Like, what would be one concrete application in which we’d want to do that? (Self-reference and things in that cluster would be the obvious use-case, I’m mostly curious if there’s any other use-case.)
Imagine, you have a function f that takes a_1, a_2, …, a_n and returns b_1, b_2, … b_m. a_1, a_2, …, a_n are boolean states of the known world and b_1, b_2, … b_m boolean states of the world you don’t yet know. Because f uses predicate logic internally you can’t modify it to take values between 0 and 1 and have to accept that it can only take boolean values.
When you do your probability augmentation you can easily add probabilities to a_1, a_2, …, a_n and have P(a_1), P(a_2), …, P(a_n), as those are part of the known world.
On the other hand, how would you get P(b_1), P(b_2), … , P(b_m)?
I’m not quite understanding the example yet. Two things which sound similar, but are probably not what you mean because they’re straightforward Bayesian models:
I’m given a function f: A → B and a distribution (a↦P[A=a]) over the set A. Then I push forward the distribution on A through f to get a distribution over B.
Same as previous, but the function f is also unknown, so to do things Bayesian-ly I need to have a prior over f (more precisely, a joint prior over f and A).
How is the thing you’re saying different from those?
Or: it sounds like you’re talking about an inference problem, so what’s the inference problem? What information is given, and what are we trying to predict?
I’m talking about a function that takes a one-dimensional vector of booleans A and returns a one-dimensional vector B. The function does not accept a one-dimensional vector of real numbers between 0 and 1.
To be able to “push forward” probabilities, f would need to be defined to handle probabilities.
where I[...] is an indicator function. In terms of interpretation: this is the frequency at which I will see B take on value b, if I sample A from the distribution P[A] and then compute B via B = f(A).
What do you want to do which is not that, and why do you want to do it?
Most of the time, the data you gather about the world is that you have a bunch of facts about the world and probabilities about the individual data points and you would want as an outcome also probabilities over individual datapoints.
As far as my own background goes, I have not studied logic or the math behind the AI algorithm that David Chapman wrote. I did study bioinformatics in that that study we did talk about probabilities calculations that are done in bioinformatics, so I have some intuitions from that domain, so I take a bioinformatics example even if I don’t know exactly how to productively apply predicate calculus to the example.
If you for example get input data from gene sequencing and billions of probabilities (a_1, a_2, …, a_n) and want output data about whether or not individual genetic mutations exist (b_1, b_2, …, b_m) and not just P(B) = P(b_1) * P(b_2) * … * P(b_m).
If you have m = 100,000 in the case of possible genetic mutations, P(B) is a very small number with little robustness to error. A single bad b_x will propagate to make your total P(B) unreliable. You might have an application where getting a b_234, b_9538 and b _33889 wrong is an acceptable error because most of the values where good.
To bring probability into the picture, the logic needs to be augmented with enough probabilities of values of variables in the logic that the rest of the probabilities can be derived.
I feel like this treat predicate logic as being “logic with variables”, but “logic with variables” seems more like Aristotelian logic than like predicate logic to me.
Another way to view it: a logic, possibly a predicate logic, is just a compact way of specifying a set of models (in the logician’s sense of the word “models”, i.e. the things a Bayesian would normally call “worlds”). Roughly speaking, to augment that logic into a probabilistic model, we need to also supply enough information to derive the probability of each (set of logician!models/Bayesian!worlds which assigns the same truth-values to all sentences expressible in the logic).
Idk, I guess the more fundamental issue is this treats the goal as simply being assigning probabilities to statements in predicate logic, whereas his point is more about whether one can do compositional reasoning about relationships while dealing with nebulosity, and it’s this latter thing that’s the issue.
What’s a concrete example in which we want to “do compositional reasoning about relationships while dealing with nebulosity”, in a way not handled by assigning probabilities to statements in predicate logic? What’s the use-case here? (I can see a use-case for self-reference; I’m mainly interested in any cases other than that.)
First, the part about using models/logics with probabilities. (This part isn’t about model comparison per se, but is necessary foundation.) (Terminological note: the thing a logician would call a “logic” or possibly a “logic augmented with some probabilities” I would instead normally call a “model” in the context of Bayesian probability, and the thing a logician would call a “model” I would instead normally call a “world” in the context of Bayesian probability; I think that’s roughly how standard usage works.) Roughly speaking: you have at least one plain old (predicate) logic, and all “random” variables are scoped to their logic, just like ordinary logic. To bring probability into the picture, the logic needs to be augmented with enough probabilities of values of variables in the logic that the rest of the probabilities can be derived. All queries involving probabilities of values of variables then need to be conditioned on a logic containing those variables, in order to be well defined.
Typical example: a Bayes net is a logic with a finite set of variables, one per node in the net, augmented with some conditional probabilities for each node such that we can derive all probabilities.
Most of the interesting questions of world modeling are then about “model comparison” (though a logician would probably rather call it “logic comparison”): we want to have multiple hypotheses about which logics-augmented-with-probabilities best predict some real-world system, and test those hypotheses statistically just like we test everything else. That’s why we need model comparison.
The main point of the article is that once you add probabilities you can’t do predicate calculus anymore. It’s a mathematical operation that’s not defined for the entities that you get when you do your augmentation.
Is the complaint that you can’t do predicate calculus on the probabilities? Because I can certainly use predicate calculus all I want on the expressions within the probabilities.
And if that is the complaint, then my question is: why do we want to do predicate calculus on the probabilities? Like, what would be one concrete application in which we’d want to do that? (Self-reference and things in that cluster would be the obvious use-case, I’m mostly curious if there’s any other use-case.)
Imagine, you have a function f that takes a_1, a_2, …, a_n and returns b_1, b_2, … b_m. a_1, a_2, …, a_n are boolean states of the known world and b_1, b_2, … b_m boolean states of the world you don’t yet know. Because f uses predicate logic internally you can’t modify it to take values between 0 and 1 and have to accept that it can only take boolean values.
When you do your probability augmentation you can easily add probabilities to a_1, a_2, …, a_n and have P(a_1), P(a_2), …, P(a_n), as those are part of the known world.
On the other hand, how would you get P(b_1), P(b_2), … , P(b_m)?
I’m not quite understanding the example yet. Two things which sound similar, but are probably not what you mean because they’re straightforward Bayesian models:
I’m given a function f: A → B and a distribution (a↦P[A=a]) over the set A. Then I push forward the distribution on A through f to get a distribution over B.
Same as previous, but the function f is also unknown, so to do things Bayesian-ly I need to have a prior over f (more precisely, a joint prior over f and A).
How is the thing you’re saying different from those?
Or: it sounds like you’re talking about an inference problem, so what’s the inference problem? What information is given, and what are we trying to predict?
I’m talking about a function that takes a one-dimensional vector of booleans A and returns a one-dimensional vector B. The function does not accept a one-dimensional vector of real numbers between 0 and 1.
To be able to “push forward” probabilities, f would need to be defined to handle probabilities.
The standard push forward here would be:
P[B=b]=∑aI[f(a)=b]P[A=a]
where I[...] is an indicator function. In terms of interpretation: this is the frequency at which I will see B take on value b, if I sample A from the distribution P[A] and then compute B via B = f(A).
What do you want to do which is not that, and why do you want to do it?
Most of the time, the data you gather about the world is that you have a bunch of facts about the world and probabilities about the individual data points and you would want as an outcome also probabilities over individual datapoints.
As far as my own background goes, I have not studied logic or the math behind the AI algorithm that David Chapman wrote. I did study bioinformatics in that that study we did talk about probabilities calculations that are done in bioinformatics, so I have some intuitions from that domain, so I take a bioinformatics example even if I don’t know exactly how to productively apply predicate calculus to the example.
If you for example get input data from gene sequencing and billions of probabilities (a_1, a_2, …, a_n) and want output data about whether or not individual genetic mutations exist (b_1, b_2, …, b_m) and not just P(B) = P(b_1) * P(b_2) * … * P(b_m).
If you have m = 100,000 in the case of possible genetic mutations, P(B) is a very small number with little robustness to error. A single bad b_x will propagate to make your total P(B) unreliable. You might have an application where getting a b_234, b_9538 and b _33889 wrong is an acceptable error because most of the values where good.
I feel like this treat predicate logic as being “logic with variables”, but “logic with variables” seems more like Aristotelian logic than like predicate logic to me.
Another way to view it: a logic, possibly a predicate logic, is just a compact way of specifying a set of models (in the logician’s sense of the word “models”, i.e. the things a Bayesian would normally call “worlds”). Roughly speaking, to augment that logic into a probabilistic model, we need to also supply enough information to derive the probability of each (set of logician!models/Bayesian!worlds which assigns the same truth-values to all sentences expressible in the logic).
Does that help?
Idk, I guess the more fundamental issue is this treats the goal as simply being assigning probabilities to statements in predicate logic, whereas his point is more about whether one can do compositional reasoning about relationships while dealing with nebulosity, and it’s this latter thing that’s the issue.
What’s a concrete example in which we want to “do compositional reasoning about relationships while dealing with nebulosity”, in a way not handled by assigning probabilities to statements in predicate logic? What’s the use-case here? (I can see a use-case for self-reference; I’m mainly interested in any cases other than that.)
You seem to be assuming that predicate logic is unnecessary, is that true?
No, I explicitly started with “you have at least one plain old (predicate) logic”. Quantification is fine.
Ah, sorry, I think I misparsed your comment.