I accidentally wrote the following as a Facebook comment to a friend, and am blatantly saving it here on grounds that it might become the core of a personal statement/research statement for previously-mentioned PhD application:
Many or even most competently-made proposals for AGI currently rely on Bayesian-statistical reasoning to handle the inherent uncertainty of real-world data and provide learning and generalization capabilities. Despite this, I find that the foundation of probability theory is still done using Kolmogorov’s frequentist axiomization in terms of measure theory for spaces of total measure 1.0, while Bayesian statistics still justifies itself in terms of Cox’s Theorem—while pitching itself as the only reasonable extension of logic into real-valued uncertainty.
Problem: you can’t talk about “rational agents believe according to Bayes” if you’re trying to build a rational agent, because real-world agents have to be able to generalize above the level of propositional logic to quantified formulas (propositional formulas in which forall and exists quantifiers can appear), both first-order and sometimes higher-order. First-order and higher-order logics of certainty (in various kinds: classical, intuitionistic, relevance, linear, substructural) have been formalized in terms of model theory for definitions of truth and in terms of computation for definitions of proof—this then gets extended into different mathematical foundations like ZFC, type theory, categorical foundations, etc.
I have done cursory Google searches to find extensions of probability into higher-order logics, and found nothing on the same level of rigor as the rest of logic. This is a big problem if we want to rigorously state a probabilistic foundation for mathematics, and we want to do that because a “foundation for mathematics” in terms of a logic, its model(s), and its proof system(s) is really a description of how to mechanize (computationally encode) knowledge. If your mathematical foundation can’t encode quantified statements, that’s a limit on what your agent can think. If it can’t encode uncertainty, that’s a limit on what your agent can think. If it can’t encode those two together, THAT’S A LIMIT ON WHAT YOUR AGENT CAN THINK.
Which means that most AI/AGI work actually doesn’t bother with sound reasoning.
A lot of effort in AI went into combining advantages of logic and probability theory for representing things. Languages that admit uncertainty and are strictly more powerful than propositional logic are practically a cottage industry now. There is Brian Milch’s BLOG, Pedro Domingo’s Markov logic networks, etc. etc.
A lot of effort in AI went into combining advantages of logic and probability theory for representing things. Languages that admit uncertainty and are strictly more powerful than propositional logic are practically a cottage industry now. There is Brian Milch’s BLOG, Pedro Domingo’s Markov logic networks, etc. etc.
They culminate in the present-day probabilistic programming field, which is the subject studied by the lab I’m about to go visit in a few short hours. They are exactly the approach to this problem which I think makes sense: treat the search for a program as a search for a proof of a proposition, then make programs represent distributions over proofs rather than single proofs, then probabilize everything and make various forms of statistical inference correspond to updating the distributions over proofs, culminating in statistically learned, logically rich knowledge about arbitrary constructions. Curry-Howard + probability = fully general probabilistic models.
So, why does anyone still consider “logical probability” an actual problem, given that these all exist? I am frustratingly ready to raise my belief in the sentence, “Academia solved what LW (and much of the rest of the AI community) still believes are open problems decades ago, but in such thick language that nobody quite realized it.”
I mean, Hutter published a 52-page paper on probability values for sentences in first-order logic just last year, and I generally consider him professional-level competent.
I find that the foundation of probability theory is still done using Kolmogorov’s frequentist axiomization in terms of measure theory for spaces of total measure 1.0, while Bayesian statistics still justifies itself in terms of Cox’s Theorem
Can you expand on that? The connection between Kolmogorov’s and Cox’s foundations and frequentist vs. Bayesian interpretations is not clear to me. The only mathematical difference is that Cox’s axioms don’t give you countable additivity, but that doesn’t seem to be a frequentist vs. Bayesian point of dispute.
The connection between Kolmogorov’s and Cox’s foundations and frequentist vs. Bayesian interpretations is not clear to me.
Kolmogorov’s axiomatization treats events as elements of a powerset rather than as propositions. Cox’s axioms give you belief-allocation among any finite number of quantifier-free propositions, which then lets you derive countable additivity quickly from there.
Whereas I kinda think that distributions and degrees of belief ought to be themselves foundational, degrading to classical logic in the limit as we approach certainty. This is the general approach taken in the probabilistic programming community.
I accidentally wrote the following as a Facebook comment to a friend, and am blatantly saving it here on grounds that it might become the core of a personal statement/research statement for previously-mentioned PhD application:
Many or even most competently-made proposals for AGI currently rely on Bayesian-statistical reasoning to handle the inherent uncertainty of real-world data and provide learning and generalization capabilities. Despite this, I find that the foundation of probability theory is still done using Kolmogorov’s frequentist axiomization in terms of measure theory for spaces of total measure 1.0, while Bayesian statistics still justifies itself in terms of Cox’s Theorem—while pitching itself as the only reasonable extension of logic into real-valued uncertainty.
Problem: you can’t talk about “rational agents believe according to Bayes” if you’re trying to build a rational agent, because real-world agents have to be able to generalize above the level of propositional logic to quantified formulas (propositional formulas in which forall and exists quantifiers can appear), both first-order and sometimes higher-order. First-order and higher-order logics of certainty (in various kinds: classical, intuitionistic, relevance, linear, substructural) have been formalized in terms of model theory for definitions of truth and in terms of computation for definitions of proof—this then gets extended into different mathematical foundations like ZFC, type theory, categorical foundations, etc.
I have done cursory Google searches to find extensions of probability into higher-order logics, and found nothing on the same level of rigor as the rest of logic. This is a big problem if we want to rigorously state a probabilistic foundation for mathematics, and we want to do that because a “foundation for mathematics” in terms of a logic, its model(s), and its proof system(s) is really a description of how to mechanize (computationally encode) knowledge. If your mathematical foundation can’t encode quantified statements, that’s a limit on what your agent can think. If it can’t encode uncertainty, that’s a limit on what your agent can think. If it can’t encode those two together, THAT’S A LIMIT ON WHAT YOUR AGENT CAN THINK.
Which means that most AI/AGI work actually doesn’t bother with sound reasoning.
Hi Eli,
A lot of effort in AI went into combining advantages of logic and probability theory for representing things. Languages that admit uncertainty and are strictly more powerful than propositional logic are practically a cottage industry now. There is Brian Milch’s BLOG, Pedro Domingo’s Markov logic networks, etc. etc.
Have you read Joe Halpern’s paper on semantics:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.5699
They culminate in the present-day probabilistic programming field, which is the subject studied by the lab I’m about to go visit in a few short hours. They are exactly the approach to this problem which I think makes sense: treat the search for a program as a search for a proof of a proposition, then make programs represent distributions over proofs rather than single proofs, then probabilize everything and make various forms of statistical inference correspond to updating the distributions over proofs, culminating in statistically learned, logically rich knowledge about arbitrary constructions. Curry-Howard + probability = fully general probabilistic models.
So, why does anyone still consider “logical probability” an actual problem, given that these all exist? I am frustratingly ready to raise my belief in the sentence, “Academia solved what LW (and much of the rest of the AI community) still believes are open problems decades ago, but in such thick language that nobody quite realized it.”
I mean, Hutter published a 52-page paper on probability values for sentences in first-order logic just last year, and I generally consider him professional-level competent.
Not yet. I’m looking it over now.
Can you expand on that? The connection between Kolmogorov’s and Cox’s foundations and frequentist vs. Bayesian interpretations is not clear to me. The only mathematical difference is that Cox’s axioms don’t give you countable additivity, but that doesn’t seem to be a frequentist vs. Bayesian point of dispute.
Kolmogorov’s axiomatization treats events as elements of a powerset rather than as propositions. Cox’s axioms give you belief-allocation among any finite number of quantifier-free propositions, which then lets you derive countable additivity quickly from there.
Whereas I kinda think that distributions and degrees of belief ought to be themselves foundational, degrading to classical logic in the limit as we approach certainty. This is the general approach taken in the probabilistic programming community.