Bayesian inference is just the latest in that long line. It may be the one true way to reason about uncertainty, as predicate calculus is the one true way to reason about truth and falsity, but that does not make of it a universal algorithm for thinking.
I didn’t get the impression that Bayesian inference itself was going to produce intelligence; the impression I have is that Bayesian inference is the best possible interface with reality. Attach a hypothesis-generating module to one end and a sensor module to the other and that thing will develop the correctest-possible hypotheses. We just don’t have any feasible hypothesis-generators.
I didn’t get the impression that Bayesian inference itself was going to produce intelligence
I do get that impression from people who blithely talk of “Bayesian superintelligences”. Example. What work is the word “Bayesian” doing there?
In this example, a Bayesian superintelligence is conceived as having a prior distribution over all possible hypotheses (for example, a complexity-based prior) and using its observations to optimally converge on the right one. You can even make a theoretically optimal learning algorithm that provably converges on the best hypothesis. (I forget the reference for this.) Where this falls down is the exponential explosion of hypothesis space with complexity. There no use in a perfect optimiser that takes longer than the age of the universe to do anything useful.
I didn’t get the impression that Bayesian inference itself was going to produce intelligence; the impression I have is that Bayesian inference is the best possible interface with reality. Attach a hypothesis-generating module to one end and a sensor module to the other and that thing will develop the correctest-possible hypotheses. We just don’t have any feasible hypothesis-generators.
I do get that impression from people who blithely talk of “Bayesian superintelligences”. Example. What work is the word “Bayesian” doing there?
In this example, a Bayesian superintelligence is conceived as having a prior distribution over all possible hypotheses (for example, a complexity-based prior) and using its observations to optimally converge on the right one. You can even make a theoretically optimal learning algorithm that provably converges on the best hypothesis. (I forget the reference for this.) Where this falls down is the exponential explosion of hypothesis space with complexity. There no use in a perfect optimiser that takes longer than the age of the universe to do anything useful.