What program structures enable efficient induction?

previously: My decomposition of the alignment problem

A simple model of meta/​continual learning

In the framework of solomonoff induction, we observe an infinite stream of bitstring and we try to predict the next bit by finding the shortest hypothesis which reproduces our observations (some caveats here). When we receive an additional bit of observation, in principle, we can rule out an infinite number of hypotheses (namely all programs which didn’t predict our observation) which creates an opportunity to speedup our induction process for future observations. Specifically, as we try to find the next shortest program which predicts our next bit of observation, we can learn to skip over the programs that have already been falsified by our past observations. The process of “learning how to skip over falsified programs” takes time and computational costs upfront, but it can yield dividends of computational efficiency for future induction.

This is my mental model for how agents can “learn how to learn efficiently”: An agent who has received more observations can usually adapt to new situations quicker because more incorrect hypotheses can be ruled out already, which means there’s a narrower set of remaining hypotheses to choose from.

More generally, an important question to ask is given that the underlying space of remaining hypotheses is constantly shrinking as we receive new observations, what sorts of data structures for representing hypothesis should we use to exploit that? How should we represent programs if we don’t just want to execute them, but also potentially modify them into other plausible hypothesis? If a world model is selected based on its ability to quickly adapt to new environments, what is the type signature of that world model?

Quick thoughts

  • Incremental modification: In solomonoff induction, the next shortest program which predicts the next bit of observation might look nothing like the current shortest program that reproduces the existing bits of observations. However, modifying and augmenting the current program seems much more efficient than searching for a new program from scratch, and it seems much more similar to how animals or humans update their knowledge in practice. Is there a way to structure programs that allows us to learn by incrementally modifying our existing hypothesis? Can we do this without sacrificing the expressivity of our hypothesis space?

  • Modularity: A modular program structure can be broken down into loosely coupled components, where each component influences only a few other components, leaving most other components invariant at any given time. This property can be helpful for efficient learning because when a modular program encounters a prediction error, only a small part could be responsible for that error, which means we only need to modify a small part of our program to accomodate each new observation.

  • Compression: If we picture solomonoff induction as enumerating bitstrings as programs from shorter to longer ones, then one way to “skip over falsified hypotheses” is to enumerate bitstrings under a compressed encoding which ignores falsified programs, where shorter bitstrings correspond to likelier hypotheses that have not been ruled out. Unfortunately, learning induces another induction problem, but we can still reap the benefits insofar as we can efficiently find a generalizable approximation of the encoding

  • Closing the loop: Solomonoff induction can be framed as compression over the space of observations, while approximating the compressed encoding is essentially compression over program space. We can continue this recursion by approximating a compressed encoding over the space of encodings (which would allow us to update our encodings based on observations more efficiently), then approximate another compressed encoding over , and so on and so on. This is one picture of how we can perform meta-learning at all levels and learn meta-patterns with increasing levels of abstractions.

Why this might be relevant for alignment

Transformative AI will often need to modify their ontologies in order to accomodate new observations, which means that if we want to translate our preferences over real world objects to the AI’s world model, we need to be able to stably “point” to real world objects despite ontology shifts. If efficient learning relies on specific data structures for representing hypotheses, these structures may reveal properties that remain invariant under ontology shifts. By identifying these invariant properties, we can potentially create robust ways to maintain our preferences within the AI’s evolving world model.

Furthermore, insofar as humans utilize a similar data structure to represent their world models, this could provide insights into how our actual preferences remain consistent despite ontology shifts, offering a potential blueprint for replicating this process in AI.