What if, as Gwern proposes, intelligence is simply “search over the space of Turing machines”, i.e. AIXI? Currently this definition is what feels closest to the empirical realities in ML capabilities; that ‘expert knowledge’ and ‘inductive bias’ have largely lost out to the cold realities of scaling compute and data.
All we are doing when we are doing “learning,” or when we are doing “scaling,” is that we’re searching over more and longer Turing machines, and we are applying them in each specific case.
Otherwise, there is no general master algorithm. There is no special intelligence fluid. It’s just a tremendous number of special cases that we learn and we encode into our brains.
If this turns out to be correct, as opposed to something more ‘principled’ or ‘organic’ like natural abstractions or shard theory, why would we expect to be able to understand it (mechanistically or otherwise) at all? In this world we should be focusing much more on scary demos or evals or other things that seem robustly good for reducing X-risk.
I don’t think it can literally be AIXI/search over Turing Machines, because it’s an extremely unrealistic model of how future AIs work, but I do think a related claim is true in that inductive biases mattered a lot less than we thought in the 2000s-early 2010s, and this matters.
The pitch for natural abstractions is that compute limits of real AGIs/ASIs force abstractions rather than brute-force simulation of the territory, combined with hope that abstractions are closer to discrete than continuous, combined with hope that other minds naturally learn these abstractions in pursuit of capabilities (I see LLMs as evidence for natural abstractions being relevant), but yes I think that a mechanistic understanding of how AIs work is likely to not exist in time, if at all, so I am indeed somewhat bearish on mech interp.
This is why I tend to favor direct alignment approaches like altering the data over approaches that rely on interpretability.
I don’t think it can literally be AIXI/search over Turing Machines, because it’s an extremely unrealistic model of how future AIs work, but I do think a related claim is true in that inductive biases mattered a lot less than we thought in the 2000s-early 2010s, and this matters.
The pitch for natural abstractions is that compute limits of real AGIs/ASIs force abstractions rather than brute-force simulation of the territory, combined with hope that abstractions are closer to discrete than continuous, combined with hope that other minds naturally learn these abstractions in pursuit of capabilities (I see LLMs as evidence for natural abstractions being relevant), but yes I think that a mechanistic understanding of how AIs work is likely to not exist in time, if at all, so I am indeed somewhat bearish on mech interp.
This is why I tend to favor direct alignment approaches like altering the data over approaches that rely on interpretability.