The prevailing misconception is that by assuming that ‘the future will be like the past’, it can ‘derive’ (or ‘extrapolate’ or ‘generalise’) theories from repeated experiences by an alleged process called ‘induction’. But that is impossible. I myself remember, for example, observing on thousands of consecutive occasions that on calendars the first two digits of the year were ‘19’. I never observed a single exception until, one day, they started being ‘20’. Not only was I not surprised…
I think this paragraph illustrates the key failure of Deutsch’s stance: he assumes all statistical methods must be fundamentally naive. This is about equivalent to assuming all statistical methods operate on small data sets. Of course, if your entire dataset is a moderately long list of numbers that all begin with the number 19, your statistical method will naively assume that the 19s will continue with high probability. But this restriction on the size and complexity of the data set is completely arbitrary. Humans experience, and learn from, an enormously vast data set containing language, images, sound, sensorimotor feedback, and more; all of it indexed by a time variable that permits correlational analysis (the man’s lips moved in a certain way and the word ‘kimchi’ came out). The human learning process constructs, with some degree of success, complex world theories that describe this vast data set. When the brain perceives a sequence of dates, as Deutsch mentions, it does not analyze the sequence in isolation and create a simple standalone theory to do the prediction; rather it understands that the sequence is embedded in a much larger web of interrelated data, and correctly applies the complex world theory to produce the right prediction. In other words, though both the data set and the chosen hypothesis are large and complex, the operation is essentially Bayesian in character. Human brains certainly assume the future is like the past, but we know that the past is more complex than a simple sequential theory would predict; when the future is genuinely unlike the past, humans run into serious difficulty.
the key failure of Deutsch’s stance: he assumes all statistical methods must be fundamentally naive.
I don’t want to speak for Deutsch, but since I’m sympathetic to his point of view I’ll point out that a better way to formulate the issue would be to say that all statistical methods rest on some assumptions and when these assumptions break the methods fail.
This is about equivalent to assuming all statistical methods operate on small data sets.
Not at all. The key issue isn’t the size of the data set, the key issue is stability of the underlying process.
To use the calendar example, you can massively increase your data set by sampling not every day but, say, every second. And yet this will not help you one little bit.
restriction on the size and complexity of the data set is completely arbitrary.
Not really. Things have to computable before the heat death of the universe. Or, less dramatically and more practically, the answer to the question must be received while there is still the need for an answer. This imposes rather serious restrictions on the size and complexity of the data that you can deal with.
and correctly applies the complex world theory to produce the right prediction.
Sometimes correctly. And sometimes incorrectly. Brains operate more by heuristics than by statistical methods and the observation that a heurstic can be useful doesn’t help you define under which constraints statistical methods will work.
You do realize that people are working on logical uncertainty under limited time, and this could tell an AI how to re-examine its assumptions? I admit that Gaifman at Columbia deals only with a case where we know the possibilities beforehand (at least in the part I read). But if the right answer has a description in the language we’re using, then it seems like E.T. Jaynes theoretically addresses this when he recommends having an explicit probability for ‘other hypotheses.’
Then again, if this approach didn’t come up when the authors of “Tiling Agents” discuss utility maximization, perhaps I’m overestimating the promise of formalized logical uncertainty.
I think this paragraph illustrates the key failure of Deutsch’s stance: he assumes all statistical methods must be fundamentally naive. This is about equivalent to assuming all statistical methods operate on small data sets. Of course, if your entire dataset is a moderately long list of numbers that all begin with the number 19, your statistical method will naively assume that the 19s will continue with high probability. But this restriction on the size and complexity of the data set is completely arbitrary. Humans experience, and learn from, an enormously vast data set containing language, images, sound, sensorimotor feedback, and more; all of it indexed by a time variable that permits correlational analysis (the man’s lips moved in a certain way and the word ‘kimchi’ came out). The human learning process constructs, with some degree of success, complex world theories that describe this vast data set. When the brain perceives a sequence of dates, as Deutsch mentions, it does not analyze the sequence in isolation and create a simple standalone theory to do the prediction; rather it understands that the sequence is embedded in a much larger web of interrelated data, and correctly applies the complex world theory to produce the right prediction. In other words, though both the data set and the chosen hypothesis are large and complex, the operation is essentially Bayesian in character. Human brains certainly assume the future is like the past, but we know that the past is more complex than a simple sequential theory would predict; when the future is genuinely unlike the past, humans run into serious difficulty.
I don’t want to speak for Deutsch, but since I’m sympathetic to his point of view I’ll point out that a better way to formulate the issue would be to say that all statistical methods rest on some assumptions and when these assumptions break the methods fail.
Not at all. The key issue isn’t the size of the data set, the key issue is stability of the underlying process.
To use the calendar example, you can massively increase your data set by sampling not every day but, say, every second. And yet this will not help you one little bit.
Not really. Things have to computable before the heat death of the universe. Or, less dramatically and more practically, the answer to the question must be received while there is still the need for an answer. This imposes rather serious restrictions on the size and complexity of the data that you can deal with.
Sometimes correctly. And sometimes incorrectly. Brains operate more by heuristics than by statistical methods and the observation that a heurstic can be useful doesn’t help you define under which constraints statistical methods will work.
You do realize that people are working on logical uncertainty under limited time, and this could tell an AI how to re-examine its assumptions? I admit that Gaifman at Columbia deals only with a case where we know the possibilities beforehand (at least in the part I read). But if the right answer has a description in the language we’re using, then it seems like E.T. Jaynes theoretically addresses this when he recommends having an explicit probability for ‘other hypotheses.’
Then again, if this approach didn’t come up when the authors of “Tiling Agents” discuss utility maximization, perhaps I’m overestimating the promise of formalized logical uncertainty.