Predictions of our best experts, statistically evaluated, are nonetheless biased. Thank you Katja for contributing additional results and compiling charts. But enlarging quantity of people being asked will not result in better predictive quality. It would be funny to see results of a poll on HLMI time forecast within our reading group. But this will only tell us who we are and nothing about the future of AGI. Everybody in our reading group is at least a bit biased by having read chapters of Nick Bostrums book. Groupthink and biased perception are the biggest obstacles when predicting the future. Expert elicitation is no scientific methodology. It is collective educated guessing.
Trend extrapolation
Luke Muehlhauser commented Ray Kurzweils success in predicting when for the first time a chess program would defeat the human World Champion:
Those who forecasted this event with naive trend extrapolation (e.g. Kurzweil 1990) got almost precisely the correct answer (1997).
Luke Muehlhauser opened my eyes by admitting:
Hence, it may be worth searching for a measure for which (a) progress is predictable enough to extrapolate, and for which (b) a given level of performance on that measure robustly implies the arrival of Strong AI. But to my knowledge, this has not yet been done, and it’s not clear that trend extrapolation can tell us much about AI timelines until such an argument is made, and made well.
For Weak AI problems trend extrapolation is working. In image processing research it is common to accept computing times of minutes for a single frame of a real-time video sequence: Hardware and software will advance and can be scaled. Within five years this new algorithm will become real time capable. Weak AI capability is easily measurable. Scaling efficiency of many Weak AI problems (e.g. if search trees are involved) is dominantly linear and therefore predictable.
For Strong AI let’s make trend prediction work! Let’s call our tool Intelligometry. I coined this term today and I hope it will bring us forward towards scientific methodology and predictability.
Intelligometry: Theory of multidimensional metrics to measure skills and intelligence. The field of intelligometry involves development and standardization of tests to get objective comparability between HI and AI systems.
Unfortunately the foundation of intelligence metrics is scarce. The anthropocentric IQ measure with a mean of 100 and standard deviation of 15 (by definition) is the only widely accepted intelligence metrics for humans. Short IQ tests cover only 2 sigma range. These tests can give results from 70 to 130. Extensive tests cover as well up to 160. Howard Gardners theory of multiple intelligences could be a starting point for test designs. He identifies 9 intelligence modalities:
Musical–rhythmic and harmonic
Visual–spatial
Verbal–linguistic
Logical–mathematical
Bodily–kinesthetic
Interpersonal
Intrapersonal
Naturalistic
Existential
Although there is some criticism and marginal empirical proof, education received stimulus by this theory. It could be that humans have highly intercorrelated intelligence modalities and the benefit of this differentiation is not so high. Applied to AI systems with various architectures we can expect to find significant differences.
Huge differences in AI capabilities compared to human and other AIs make a linear scale impractical. Artificial intelligence measures should be defined on a logarithmic scale. Two examples: To multiply two 8-digit numbers a human might need 100s. A 10MFlops smart phone processor would process 1E9 times as much multiplications. RIKENs K computer (4th on Top500) with 10PFlops is 1E18 times faster than a human. On the contrary: A firefighter can run though complex unknown rooms may be 100 times faster than a Robocup rescue challenge robot. The robot is 1E-2 times “faster”.
We shoud inspire other researchers to challenge humans with exact the same task they challenge their machines. They should generate solid data for statistical analysis. Humans of both sexes and all age classes should be tested. Joint AI and psychology research will bring synergistic effects.
It is challenging to design tests that are able to discriminate the advancement of an AI from very low capabilities, e.g. from 1E-6 to 1E-5. If the test consists of complex questions it could be that the AI answers 10% correctly by guessing. The advance from 100,001 correct answers to 100,010 correct ones means that true understanding of the AI improved by a factor of 10. The tiny difference probably remains undetected in the noise of guessing.
Intelligometry could supply methodology and data we need for proper predictions. AI research should manage to establish a standardized way of documentation. These standards shall be part of all AI curricula. Public funded AI related research projects should use standardized tests and documentation schemes. If we manage to move from educated guessing to trend extrapolation on solid data within the next ten years (3 PhD generations) we have managed a lot. This will be for the first time a reliable basis for predictions. These predictions will be the solid ground to guide our governments and research institutes regarding global action plans towards a sustainable future for us humans.
Intelligometry
Opinions about the future and expert elicitation
Predictions of our best experts, statistically evaluated, are nonetheless biased. Thank you Katja for contributing additional results and compiling charts. But enlarging quantity of people being asked will not result in better predictive quality. It would be funny to see results of a poll on HLMI time forecast within our reading group. But this will only tell us who we are and nothing about the future of AGI. Everybody in our reading group is at least a bit biased by having read chapters of Nick Bostrums book. Groupthink and biased perception are the biggest obstacles when predicting the future. Expert elicitation is no scientific methodology. It is collective educated guessing.
Trend extrapolation
Luke Muehlhauser commented Ray Kurzweils success in predicting when for the first time a chess program would defeat the human World Champion:
Luke Muehlhauser opened my eyes by admitting:
For Weak AI problems trend extrapolation is working. In image processing research it is common to accept computing times of minutes for a single frame of a real-time video sequence: Hardware and software will advance and can be scaled. Within five years this new algorithm will become real time capable. Weak AI capability is easily measurable. Scaling efficiency of many Weak AI problems (e.g. if search trees are involved) is dominantly linear and therefore predictable.
For Strong AI let’s make trend prediction work! Let’s call our tool Intelligometry. I coined this term today and I hope it will bring us forward towards scientific methodology and predictability.
Intelligometry: Theory of multidimensional metrics to measure skills and intelligence. The field of intelligometry involves development and standardization of tests to get objective comparability between HI and AI systems.
Unfortunately the foundation of intelligence metrics is scarce. The anthropocentric IQ measure with a mean of 100 and standard deviation of 15 (by definition) is the only widely accepted intelligence metrics for humans. Short IQ tests cover only 2 sigma range. These tests can give results from 70 to 130. Extensive tests cover as well up to 160.
Howard Gardners theory of multiple intelligences could be a starting point for test designs. He identifies 9 intelligence modalities:
Musical–rhythmic and harmonic
Visual–spatial
Verbal–linguistic
Logical–mathematical
Bodily–kinesthetic
Interpersonal
Intrapersonal
Naturalistic
Existential
Although there is some criticism and marginal empirical proof, education received stimulus by this theory. It could be that humans have highly intercorrelated intelligence modalities and the benefit of this differentiation is not so high. Applied to AI systems with various architectures we can expect to find significant differences.
Huge differences in AI capabilities compared to human and other AIs make a linear scale impractical. Artificial intelligence measures should be defined on a logarithmic scale. Two examples: To multiply two 8-digit numbers a human might need 100s. A 10MFlops smart phone processor would process 1E9 times as much multiplications. RIKENs K computer (4th on Top500) with 10PFlops is 1E18 times faster than a human. On the contrary: A firefighter can run though complex unknown rooms may be 100 times faster than a Robocup rescue challenge robot. The robot is 1E-2 times “faster”.
We shoud inspire other researchers to challenge humans with exact the same task they challenge their machines. They should generate solid data for statistical analysis. Humans of both sexes and all age classes should be tested. Joint AI and psychology research will bring synergistic effects.
It is challenging to design tests that are able to discriminate the advancement of an AI from very low capabilities, e.g. from 1E-6 to 1E-5. If the test consists of complex questions it could be that the AI answers 10% correctly by guessing. The advance from 100,001 correct answers to 100,010 correct ones means that true understanding of the AI improved by a factor of 10. The tiny difference probably remains undetected in the noise of guessing.
Intelligometry could supply methodology and data we need for proper predictions. AI research should manage to establish a standardized way of documentation. These standards shall be part of all AI curricula. Public funded AI related research projects should use standardized tests and documentation schemes. If we manage to move from educated guessing to trend extrapolation on solid data within the next ten years (3 PhD generations) we have managed a lot. This will be for the first time a reliable basis for predictions. These predictions will be the solid ground to guide our governments and research institutes regarding global action plans towards a sustainable future for us humans.