On knowledge extrapolation in AI models

TL;DR: Humans have been able to extrapolate knowledge (i.e. discover or create something qualitatively new) by large number of iterations over centuries. Neural networks have inherent difficulties at extrapolating knowledge beyond their training datasets and need continuous feedback to get the extrapolation correctly. I speculate that this feedback requirement might be one of the non-minor factors in AGI development and regulation.

Since my very introduction into the machine learning world I was taught that neural networks are good at interpolating but bad at extrapolating data.

First, a simple mathematical example. Say, if we are an artificial neural network (ANN), and are given data points $(x, f (x))$ : ${(- 2.0, 1.2); (- 1.1, 1.8); (0.0, 3.1); (0.9, 3.7)}$ and are asked to predict the value of unknown $f (x)$ at $x = 0.5$ , it is somewhat reasonable to hypothesize that the function $f : x \to y$ in between these points is roughly $f (x) = x + 3$ (reasonability comes from the requirement for the function to not be oscillating crazily, otherwise it would be akin to high-amplitude noise) and therefore $f (0.5) \approx 3.5$ .

But if we are instead asked to predict

f (100)

, we actually have very little idea of how the function might behave in that region. It is possible for the function to go smoothly enough to a lot of different values—and we have absolutely no way of telling which of them is more correct. And it’s only a 1D problem! In real ML and AI models, where the dimensionality of the data space may be 1000D or more, it only gets worse. (Though the dimensionality of the data manifold is usually much lower) (More on data manifolds in ANNs here)

(A 3D example with a visible 2D strip. If it was a model’s inner manifold, the model could produce outputs on the strip confidently, but getting closer to its edges would be problematic. And how does this strip continue where there is no more training data? The model just can’t know.)

Translating this example back to words: if a neural network has some pieces of knowledge, it should not be very difficult to gain some other knowledge from a combination of these fragments. (Note: hereafter I use the term ‘knowledge’ and ‘knowledge manifold’ as an abbreviation for ‘knowledge, skills, creativity’) Discovering something qualitatively new, however, is expected to be a significantly harder problem.

Indeed, what we observe in the current capabilities of various models is mostly interpolation. Models draw images—but only in already invented styles. Models create music—but only in genres already introduced by humans. Models write texts—but only if they have already seen something similar. For instance, if an ANN is trained on acoustic guitar music, it would probably never invent rock music where guitars are distorted. All of this may sound obvious, but if we are talking about potential AGI/ASI and whether they are possible at all, for AGI to exist this ‘no-extrapolation’ property has to break somewhere somehow.

There are, however, cases when a model may discover something new, such as reinventing an unpublished quantum algorithm or inventing better solutions for math problems. But it is not necessarily an extrapolation of knowledge—it may as well be interpolating facts to a new point, a point which humans haven’t discovered yet. As AI capabilities grow—even during pre-AGI era—most, if not all of these interpolation blind spots on the map of human knowledge will be revealed.

But are natural neural networks in human brains different? I believe they aren’t. Not a lot of people come up with discoveries, or inventions, or new styles and genres of art. Most of human history it was taking generations to figure out what the laws of nature are, what kind of art looks pleasant and what mechanisms work properly.

And I think it hints to a key property that we had and have, and that ANNs don’t (yet?) have in its completeness—feedback. For each genius there were a large number of people who thought of some descriptions for laws of nature—and they failed experimentally, who drew a painting in a new crazy style—and nobody liked it, who constructed a never before built device—and it just didn’t function the way it was supposed to. Using the math example as a metaphor, those people tried out different hypotheses for values of not even $f (100)$ , but just $f (2)$ , and most of them didn’t work—but some did, and in this way were we able to firmly establish the value $f (2)$ we have known ever since.

This reasoning suggests that a capability for high-quality feedback may be one of the keys—and one of the limitations—for AGI/ASI. If we wanted create an AGI/ASI it would be necessary to:

Tie training and inference closer together. If a model produces something outside its initial training manifold, the result should not be thrown away (and afterwards get averaged to zero among myriads of other responses), but instead should be carefully examined and incorporated (either as a positive or a negative datapoint) into the model as quick as possible, so as to increase the span of the “current knowledge” manifold and gain the capability to use this datapoint as an interpolation node. (see Incremental learning)
Get feedback from humans to learn what is correct and what is not, meaning that it is at least not very clear how to accelerate it past human speeds. This is especially valid for creative tasks, which I assume are done solely to satisfy humans—and therefore only people have the necessary ability to decide whether the new AI output is brilliant or garbage. (see Active learning) Other than that, this point a) limits the feedback rate by human-AI interaction and b) feels to have loose connection with the alignment problem—if human feedback is used, it must be used properly, and it may be easier to regulate humans rather than AI.
1. Two notes on this point inspired by the Fine-tuning section of LLAMA-2 paper. First, it shows that models benefit from high-quality inputs: during fine-tuning the model’s performance is capped by the writing abilities of the most skilled annotator. My hypothesis is that greater writing skills correspond to the edge of the model knowledge set somehow. And thus showing the model mediocre annotations gives it very little—there are already some datapoints around the new one in its knowledge space. But showing it a unique high-quality annotation provides a new point beyond its current knowledge manifold—and therefore allows its expansion.
2. Second, there are also hints that I may be wrong and humans are not that necessary, at least for LLMs: we found that the outputs sampled from the resulting SFT [supervised fine-tuning] model were often competitive with SFT data handwritten by human annotators. A similar piece of evidence is OpenAI finetuning GPT-4 with GPT-2, though its performance was still lower than GPT-4 with human reinforcement.
Spend time collecting feedback for a number of areas, such as physics: it takes time to build testing facilities and to conduct hypotheses-checking experiments. Currently, high precision experiments are built and run by humans—and neither construction nor accurate lab measurements are accelerated by AI directly (though the design gets faster and better). This may change when robots with human-level or better precision come into play. I will not go deeper in this post, but it feels like progress in robotics may be another important factor for some areas.

Some of these points could potentially be a) limiting factors that would stop the AGI/ASI development or at least slow it down to below-singularity speed on their own and b) key aspects to regulate if we were willing to control AGI/ASI development rate. At the moment they are not very influential since humanity hasn’t yet learned to extrapolate knowledge via ANNs fast, but they may become important sometime soon.