Charlie Steiner comments on QNR prospects are important for AI alignment research

Charlie Steiner 3 Feb 2022 21:59 UTC
LW: 8 AF: 3
AF
The main argument I see not to take this too literally is something like “conservation of computation.”
I think it’s quite likely that left to its own devices a black-box machine learning agent would learn to represent information in weakly hierarchical models that don’t have a simple graph structure (but maybe have some more complicated graph structure where representations are connected by complicated transition functions that also have side-effects on the state of the agent). If we actually knew how to access these representations, that would already be really useful for interpretability, and we could probably figure out some broad strokes of what was going on, but we wouldn’t be able to get detailed guarantees. Nor would this be sufficient for trust—we’d still need a convincing story about how we’d designed the agent to learn to be trustworthy.
Then if we instead constrain a learning algorithm to represent information using simple graphs of human-understandable pieces, but it can still do all the impressive things the unconstrained algorithm could do, the obvious inference is that all of that human-mysterious complicated stuff that was happening before is still happening, it’s just been swept as “implicit knowledge” into the parts of the algorithm that we didn’t constrain.
In other words, I think the use of QNR as a template for AI design provides limited benefit, because so long as the human-mysterious representation of the world is useful, and your learning algorithm learns to do useful things, it will try to cram the human-mysterious representation in somewhere. Tying to stop it is pitting yourself against the search algorithm optimizing the learned content, and this will become a less and less useful project as that search algorithm becomes more powerful. But interpreted not-too-literally I think it’s a useful picture of how we might model an AI’s knowledge so that we can do broad-strokes interpretability.
- Eric Drexler 4 Feb 2022 11:18 UTC
  LW: 11 AF: 6
  2
  AF Parent
  Although I don’t understand what you mean by “conservation of computation”, the distribution of computation, information sources, learning, and representation capacity is important in shaping how and where knowledge is represented.
  The idea that general AI capabilities can best be implemented or modeled as “an agent” (an “it” that uses “the search algorithm”) is, I think, both traditional and misguided. A host of tasks require agentic action-in-the-world, but those tasks are diverse and will be performed and learned in parallel (see the CAIS report, www.fhi.ox.ac.uk/reframing). Skill in driving somewhat overlaps with — yet greatly differs from — skill in housecleaning or factory management; learning any of these does not provide deep, state-of-the art knowledge of quantum physics, and can benefit from (but is not a good way to learn) conversational skills that draw on broad human knowledge.
  A well-developed QNR store should be thought of as a body of knowledge that potentially approximates the whole of human and AI-learned knowledge, as well as representations of rules/programs/skills/planning strategies for a host of tasks. The architecture of multi-agent systems can provide individual agents with resources that are sufficient for the tasks they perform, but not orders of magnitude more than necessary, shaping how and where knowledge is represented. Difficult problems can be delegated to low-latency AI cloud services. .
  There is no “it” in this story, and classic, unitary AI agents don’t seem competitive as service providers — which is to say, don’t seem useful..
  I’ve noted the value of potentially opaque neural representations (Transformers, convnets, etc.) in agents that must act skillfully, converse fluently, and so on, but operationalized, localized, task-relevant knowledge and skills complement rather than replace knowledge that is accessible by associative memory over a large, shared store.