It’s true that this is a case of logical uncertainty.
However, I must add that in most of my examples, I bring up the benefits of a probabilistic representation. Just because you have logical uncertainty doesn’t mean you need to represent it with probability theory.
In protein structure, we already have these Bayesian methods for inferring the fold, so the point of the probabilistic representation is to plug it i these methods as a prior. In philosophy, we want ideal rationality, which suggests probability. In automated theorem proving… okay, yeah, in automated theorem proving I can’t explain why you’d want to use probability theory in particular.
But yes. If you had a principled way to turn your background information and already done computations into a probability distribution for future computations, you could use that for AI search problems. And optimization problems. Wow, that’s a lot of problems. I’m not sure how it would stack up against other methods, but it’d be interesting if that became a paradigm for at least some problems.
In fact, now that you’ve inspired me to look for it, I find that it’s being done! Not with the approach of coming up with a distribution over all mathematical statements that you see in Christiano’s report, and which is the approach I had in mind when writing the post. But rather, with an approach like what Cari Kaufman I think uses, where you guess based on nearby points. Which is accomplished by modeling a difficult-to-evaluate function as a stochastic process with some kind of local correlations, like a Gaussian process, so that you get probability distributions for the values of the function at each point. What I’m finding is that this is, in fact, an approach people use to optimizing difficult-to-evaluate objective functions. See here for the details: Efficient Global Optimization of Expensive Black-Box Functions, by Jones, Schonlau and Welch.
Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it’s logical uncertainty or any other kind? Granted, most of the time you don’t want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you’re better off if you know what it is you’re approximating.
In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don’t think it’s because they aren’t particularly applicable. Granted, I could be wrong on this, because the core ideas aren’t particularly new (see, for example, Dynamic Probability, Computer Chess, and the Measurement of Knowledge by I. J. Good).
It’s an area of research I’m working on right now, so I’ve spent a fair amount of time looking into it. I could give a few references on the topic, but on the whole I think they’re quite sparse.
On the whole, though, it’s relatively limited. At a bare minimum there is plenty of room for probabilistic representations in order to give a better theoretical foundation, but I think there is also plenty of practical benefit to be gained from those techniques as well.
As a particular example of the applicability of these methods, there is a phenomenon referred to as “search pathology” or “minimax pathology”, in which for certain tree structures searching deeper actually leads to worse results, when using standard rules for propagating value estimates up a tree (most notably minimax). From a Bayesian perspective this clearly shouldn’t occur, and hence this phenomenon of pathology must be the result of a failure to correctly update on the evidence.
It’s true that this is a case of logical uncertainty.
However, I must add that in most of my examples, I bring up the benefits of a probabilistic representation. Just because you have logical uncertainty doesn’t mean you need to represent it with probability theory.
In protein structure, we already have these Bayesian methods for inferring the fold, so the point of the probabilistic representation is to plug it i these methods as a prior. In philosophy, we want ideal rationality, which suggests probability. In automated theorem proving… okay, yeah, in automated theorem proving I can’t explain why you’d want to use probability theory in particular.
But yes. If you had a principled way to turn your background information and already done computations into a probability distribution for future computations, you could use that for AI search problems. And optimization problems. Wow, that’s a lot of problems. I’m not sure how it would stack up against other methods, but it’d be interesting if that became a paradigm for at least some problems.
In fact, now that you’ve inspired me to look for it, I find that it’s being done! Not with the approach of coming up with a distribution over all mathematical statements that you see in Christiano’s report, and which is the approach I had in mind when writing the post. But rather, with an approach like what Cari Kaufman I think uses, where you guess based on nearby points. Which is accomplished by modeling a difficult-to-evaluate function as a stochastic process with some kind of local correlations, like a Gaussian process, so that you get probability distributions for the values of the function at each point. What I’m finding is that this is, in fact, an approach people use to optimizing difficult-to-evaluate objective functions. See here for the details: Efficient Global Optimization of Expensive Black-Box Functions, by Jones, Schonlau and Welch.
Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it’s logical uncertainty or any other kind? Granted, most of the time you don’t want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you’re better off if you know what it is you’re approximating.
In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don’t think it’s because they aren’t particularly applicable. Granted, I could be wrong on this, because the core ideas aren’t particularly new (see, for example, Dynamic Probability, Computer Chess, and the Measurement of Knowledge by I. J. Good).
It’s an area of research I’m working on right now, so I’ve spent a fair amount of time looking into it. I could give a few references on the topic, but on the whole I think they’re quite sparse.
Here’s some of the literature:
Heuristic search as evidential reasoning by Hansson and Mayer
A Bayesian Approach to Relevance in Game Playing by Baum and Smith
and also work following Stuart Russell’s concept of “metareasoning”
On Optimal Game-Tree Search using Rational Meta-Reasoning by Russell and Wefald
Principles of metareasoning by Russell and Wefald
and the relatively recent
Selecting Computations: Theory and Applications by Hay, Russell, Tolpin and Shimony.
On the whole, though, it’s relatively limited. At a bare minimum there is plenty of room for probabilistic representations in order to give a better theoretical foundation, but I think there is also plenty of practical benefit to be gained from those techniques as well.
As a particular example of the applicability of these methods, there is a phenomenon referred to as “search pathology” or “minimax pathology”, in which for certain tree structures searching deeper actually leads to worse results, when using standard rules for propagating value estimates up a tree (most notably minimax). From a Bayesian perspective this clearly shouldn’t occur, and hence this phenomenon of pathology must be the result of a failure to correctly update on the evidence.