Stuart_Armstrong comments on The flawed Turing test: language, understanding, and partial p-zombies

Stuart_Armstrong 20 May 2013 9:08 UTC
0 points
Then I’d test it on 3d movements. The point is that these tests have great validity as test for general intelligence (or something in the vicinity), if the programmer isn’t deliberately optimising or calibrating their machine on.

If you’d designed a chatterbot and it turned out to be great at playing music (and that wasn’t something you’d put in by hand), then that would be strong evidence for general intelligence.
- TheOtherDave 20 May 2013 13:55 UTC
  2 points
  Parent
  The deliberate optimization on the part of a designer is just an example of the sort of thing you are concerned about here, right? That is, if I used genetic algorithms to develop a system X, and exposed those algorithms to a set of environments E, X would be optimized for E and consequently any test centered on E (or any subset of it) would be equally unreliable as a test of general intelligence… the important thing is that because X was selected (intentionally or otherwise) to be successful at E, the fact that X is successful at E ought not be treated as evidence that X is generally intelligent.
  
  Yes?
  
  Similarly, the fact that X is successful at tasks not actually present in E, but nevertheless very similar to tasks present in E, ought not be treated as evidence that X is generally intelligent. A small amount of generalization from initial inputs is not that impressive.
  
  The question then becomes how much generalization away from the specific problems presented in E is necessary before we consider X generally intelligent.
  
  To approach the question differently—there are all kinds of cognitive tests which humans fail, because our cognitive systems just weren’t designed to handle the situations those tests measure, because our ancestral environment didn’t contain sufficiently analogous situations. At what point do we therefore conclude that humans aren’t really generally intelligent, just optimized for particular kinds of tests?