Comparing different recognition systems is complex, and it’s important to compare apples to apples. CNNs are comparable only to rapid feedforward recognition in the visual system which can be measured with rapid serial presentation experiments. In an untimed test the human brain can use other modules, memory fetches, multi-step logical inferences, etc (all of which are now making their way into ANN systems, but still).
The RSP setup ensures that the brain can only use a single feedforward pass from V1 to PFC, without using more complex feedback and recurrent loops. It forces the brain to use a network configuration similar to what current CNN used—CNNs descend from models of that pathway, after all.
In those test CNNs from 2013 rivaled primate IT cortex representations 1, and 2015 CNNs are even better.
That paper uses a special categorization task with monkeys, but the results generalize to humans as well. There are certainly some mistakes that a CNN will make which a human would not make even with the 150ms time constraint, but the CNNs make less mistakes for the more complex tasks with lots of categories, whereas humans presumably still have lower error for basic recognition tasks (but to some extent that is because researchers haven’t focused much on getting to > 99.9% accuracy on simpler recognition tasks).
Comparing different recognition systems is complex, and it’s important to compare apples to apples. CNNs are comparable only to rapid feedforward recognition in the visual system which can be measured with rapid serial presentation experiments. In an untimed test the human brain can use other modules, memory fetches, multi-step logical inferences, etc (all of which are now making their way into ANN systems, but still).
The RSP setup ensures that the brain can only use a single feedforward pass from V1 to PFC, without using more complex feedback and recurrent loops. It forces the brain to use a network configuration similar to what current CNN used—CNNs descend from models of that pathway, after all.
In those test CNNs from 2013 rivaled primate IT cortex representations 1, and 2015 CNNs are even better.
That paper uses a special categorization task with monkeys, but the results generalize to humans as well. There are certainly some mistakes that a CNN will make which a human would not make even with the 150ms time constraint, but the CNNs make less mistakes for the more complex tasks with lots of categories, whereas humans presumably still have lower error for basic recognition tasks (but to some extent that is because researchers haven’t focused much on getting to > 99.9% accuracy on simpler recognition tasks).
Cool, thanks for the paper, interesting read!