I don’t think any factored cognition proponents would disagree with
Composing interpretable pieces does not necessarily yield an interpretable system.
They just believe that we could, contingently, choose to compose interpretable pieces into an interpretable system. Just like we do all the time with
massive factories with billions of components, e.g. semiconductor fabs
large software projects with tens of millions of lines of code, e.g. the Linux kernel
military operations involving millions of soldiers and support personnel
Figuring out how to turn interpretability/tool-ness/alignment/corrigibility of the parts into interpretability/tool-ness/alignment/corrigibility of the whole is the central problem, and it’s a hard (and interesting) open research problem.
Agreed this is the central problem, though I would describe it more as engineering than research—the fact that we have examples of massively complicated yet interpretable systems means we collectively “know” how to solve it, and it’s mostly a matter of assembling a large enough and coordinated-enough engineering project. (The real problem with factored cognition for AI safety is not that it won’t work, but that equally-powerful uninterpretable systems might be much easier to build).
Do we really have such good interpretations for such examples? It seems to me that we have big problems in the real world because we don’t. We do have very high-level interpretations, but not enough to have solid guarantees. After all, we have a very high-level trivial interpretation of our ML models: they learn! The challenge is not just to have clues, but clues that are relevant enough to address safety concerns in relation to impact scale (which is the unprecedented feature of the AI field).
I don’t think any factored cognition proponents would disagree with
They just believe that we could, contingently, choose to compose interpretable pieces into an interpretable system. Just like we do all the time with
massive factories with billions of components, e.g. semiconductor fabs
large software projects with tens of millions of lines of code, e.g. the Linux kernel
military operations involving millions of soldiers and support personnel
Agreed this is the central problem, though I would describe it more as engineering than research—the fact that we have examples of massively complicated yet interpretable systems means we collectively “know” how to solve it, and it’s mostly a matter of assembling a large enough and coordinated-enough engineering project. (The real problem with factored cognition for AI safety is not that it won’t work, but that equally-powerful uninterpretable systems might be much easier to build).
Do we really have such good interpretations for such examples? It seems to me that we have big problems in the real world because we don’t.
We do have very high-level interpretations, but not enough to have solid guarantees. After all, we have a very high-level trivial interpretation of our ML models: they learn! The challenge is not just to have clues, but clues that are relevant enough to address safety concerns in relation to impact scale (which is the unprecedented feature of the AI field).