I strongly suspect that cloned hidden Markov model is going to do worse in any benchmark where there’s a big randomly-ordered set of training / tasting data, which I think is typical for ML benchmarks. I think its strength is online learning and adapting in a time-varying environment (which of course brains need to do), e.g. using this variant. Even if you find such a benchmark, I still wouldn’t be surprised if it lost to DNNs. Actually I would be surprised if you found any benchmark where it won.
I take (some) brain-like algorithms seriously for reasons that are not “these algorithms are proving themselves super useful today”. Vicarious’s robots might change that, but that’s not guaranteed. Instead there’s a different story which is “we know that reverse-engineered high-level brain algorithms, if sufficiently understood, can do everything humans do, including inventing new technology etc. So finding a piece of that puzzle can be important because we expect the assembled puzzle to be important, not because the piece by itself is super useful.”
The point of benchmarking something is not to see if it’s “better” necessarily, but to see how much worst it is.
For example, a properly tuned FCNN will almost always beat a gradient booster at a mid-sized (say < 100,000 features once you bucketize your numbers, since a GB will require that, and OHE your categories and < 100,000 samples) problem.
But gradient boosting has many other advantages around time, stability, ease of tuning, efficient ways of fitting on both CPUs and GPUs, more tradeoffs flexibility between compute and memory usage, metrics for feature importance, potentially faster inference time logic and potentially easier to train online (though both are arguable and kind of besides the point, they aren’t the main advantages).
So really, as long as benchmark tell me a gradient booster is usually just 2-5% worst than a finely tuned FCNN on this imaginary set of “mid-sized” tasks, I’d jump at the option to never use FCNNs here again, even if the benchmarks came up seemingly “against” them.
I guess I should add: an example I’m slightly more familiar with is anomaly detection in time-series data. Numenta developed the “HTM” brain-inspired anomaly detection algorithm (actually Dileep George did all the work back when he worked at Numenta, I’ve heard). Then I think they licensed it into a system for industrial anomaly detection (“the machine sounds different now, something may be wrong”), but it was a modular system, so you could switch out the core algorithm, and it turned out that HTM wasn’t doing better than the other options. This is a vague recollection, I could be wrong in any or all details. Numenta also made an anomaly detection benchmark related to this, but I just googled it and found this criticism. I dunno.
I strongly suspect that cloned hidden Markov model is going to do worse in any benchmark where there’s a big randomly-ordered set of training / tasting data, which I think is typical for ML benchmarks. I think its strength is online learning and adapting in a time-varying environment (which of course brains need to do), e.g. using this variant. Even if you find such a benchmark, I still wouldn’t be surprised if it lost to DNNs. Actually I would be surprised if you found any benchmark where it won.
I take (some) brain-like algorithms seriously for reasons that are not “these algorithms are proving themselves super useful today”. Vicarious’s robots might change that, but that’s not guaranteed. Instead there’s a different story which is “we know that reverse-engineered high-level brain algorithms, if sufficiently understood, can do everything humans do, including inventing new technology etc. So finding a piece of that puzzle can be important because we expect the assembled puzzle to be important, not because the piece by itself is super useful.”
The point of benchmarking something is not to see if it’s “better” necessarily, but to see how much worst it is.
For example, a properly tuned FCNN will almost always beat a gradient booster at a mid-sized (say < 100,000 features once you bucketize your numbers, since a GB will require that, and OHE your categories and < 100,000 samples) problem.
But gradient boosting has many other advantages around time, stability, ease of tuning, efficient ways of fitting on both CPUs and GPUs, more tradeoffs flexibility between compute and memory usage, metrics for feature importance, potentially faster inference time logic and potentially easier to train online (though both are arguable and kind of besides the point, they aren’t the main advantages).
So really, as long as benchmark tell me a gradient booster is usually just 2-5% worst than a finely tuned FCNN on this imaginary set of “mid-sized” tasks, I’d jump at the option to never use FCNNs here again, even if the benchmarks came up seemingly “against” them.
Interesting!
I guess I should add: an example I’m slightly more familiar with is anomaly detection in time-series data. Numenta developed the “HTM” brain-inspired anomaly detection algorithm (actually Dileep George did all the work back when he worked at Numenta, I’ve heard). Then I think they licensed it into a system for industrial anomaly detection (“the machine sounds different now, something may be wrong”), but it was a modular system, so you could switch out the core algorithm, and it turned out that HTM wasn’t doing better than the other options. This is a vague recollection, I could be wrong in any or all details. Numenta also made an anomaly detection benchmark related to this, but I just googled it and found this criticism. I dunno.