What if, as Gwern proposes, intelligence is simply “search over the space of Turing machines”, i.e. AIXI? Currently this definition is what feels closest to the empirical realities in ML capabilities; that ‘expert knowledge’ and ‘inductive bias’ have largely lost out to the cold realities of scaling compute and data.
All we are doing when we are doing “learning,” or when we are doing “scaling,” is that we’re searching over more and longer Turing machines, and we are applying them in each specific case.
Otherwise, there is no general master algorithm. There is no special intelligence fluid. It’s just a tremendous number of special cases that we learn and we encode into our brains.
That said, while I’m confident that literally learning special cases and just searching/look-up tables isn’t how current AIs work, there is an important degree of truth to this in general, where we are just searching over larger and more objects (though in my case it’s not just restricted to Turing Machines, but any set defined formally using ZFC + Tarski’s Axiom at minimum, links below):
And more importantly, the maximal generalization of learning/intelligence is just that we are learning ever larger look-up tables, and optimal intelligences look like look-up tables having other look-up tables when you weaken your assumptions enough.
I view the no-free-lunch theorems as essentially asserting that there exists only 1 method to learn in the worst case, which is the highly inefficient look-up table, and in the general case there are no shortcuts to learning a look-up table, you must pay the full exponential cost of storage and time (in finite domains).
That said, while I’m confident that literally learning special cases and just searching/look-up tables isn’t how current AIs work, there is an important degree of truth to this in general, where we are just searching over larger and more objects (though in my case it’s not just restricted to Turing Machines, but any set defined formally using ZFC + Tarski’s Axiom at minimum, links below):
And more importantly, the maximal generalization of learning/intelligence is just that we are learning ever larger look-up tables, and optimal intelligences look like look-up tables having other look-up tables when you weaken your assumptions enough.
https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory
https://en.wikipedia.org/wiki/Tarski%E2%80%93Grothendieck_set_theory
I view the no-free-lunch theorems as essentially asserting that there exists only 1 method to learn in the worst case, which is the highly inefficient look-up table, and in the general case there are no shortcuts to learning a look-up table, you must pay the full exponential cost of storage and time (in finite domains).