Rambling about Forecasting the order in which functions are learned by NN
Idea: Using function complexity and their “compoundness” (edit 11 september: these functions seem to be called “composite functions”), we may be able to forecast the order in which algorithms in NN are learned. And we may be able to forecast the temporal ordering of when some functions or behaviours will start generalising strongly.
Rambling: What happens when training neural networks is similar to the selection of genes in genomes or any reinforcement optimization processes. Compound functions are much harder to learn. You need each part to be independently useful initially to provide enough signal for the compound system to be reinforced.
That means that learning any non-hardcoded algorithms with many variables and multiplicative steps is very difficult. An important factor in this is the frequency at which an algorithm is useful and to which extent. An algorithm that can be very used in most situations will get much more training signals. The relative strength of the reward signal you get is important because of the noise in the training and because of catastrophic forgetting.
LLMs are not learning complex algorithms yet. They are learning something like a world model because this is used for most tasks and it can be built by first building each part separately and then assembling them.
Regarding building algorithms to exploit this world model, it can be learned later if the algorithm is composed first of very simple algorithms that can be later assembled. An extra difficulty for LLMs to learn algorithms is in situations where heuristics already work very well. In that case, you need to add significant regularisation pushing for simpler circuits. Then you may observe grokking learning and a transition from heuristics to algorithms.
An issue with this reasoning is that heuristics are 1-step algorithms (0 compoundness).
Effects:
- Frequency of reward
- Strength of the additional reward (above the “heuristic baseline”)
- Compoundness
Forecasting game: (WIP, mostly a failure at that point)
Early to generalize well: World models can be built from simple parts, and are most of the time valuable. Generalizable algorithm for simple and frequent tasks on which heuristics fail dramatically: ??? (maybe) generating random numbers, ??
Medium to generalize well: Generalizable deceptive alignment algorithms: They require several components to work. But they are useful for many tasks. The strength of the additional reward is not especially high or low. Generalizable instrumental convergence algorithms: Same as deceptive alignment. Generalizable short horizon algorithms: They, by definition, require fewer sequential steps, as such they should be less “compounded” functions and appear sooner.
Late: Generalizable long horizon algorithms: They, by definition, require more sequential steps, as such they should be more “compounded” functions and appear later.
The latest: Generalizable long horizon narrow capabilities: They are not frequently reinforced.
Quick results: The task is predicting the sign of the product of 1 (function 1) to 8 (function 8) standard normal random variables. Increasing the compoundness by 2 seems to delay the grokking learning by something like 1 OOM.
Hm, what do you mean by “generalizable deceptive alignment algorithms”? I understand ‘algorithms for deceptive alignment’ to be algorithms that enable the model to perform well during training because alignment-faking behavior is instrumentally useful for some long-term goal. But that seems to suggest that deceptive alignment would only emerge – and would only be “useful for many tasks” – after the model learns generalizable long-horizon algorithms.
Rambling about Forecasting the order in which functions are learned by NN
Idea:
Using function complexity and their “compoundness” (edit 11 september: these functions seem to be called “composite functions”), we may be able to forecast the order in which algorithms in NN are learned. And we may be able to forecast the temporal ordering of when some functions or behaviours will start generalising strongly.
Rambling:
What happens when training neural networks is similar to the selection of genes in genomes or any reinforcement optimization processes. Compound functions are much harder to learn. You need each part to be independently useful initially to provide enough signal for the compound system to be reinforced.
That means that learning any non-hardcoded algorithms with many variables and multiplicative steps is very difficult.
An important factor in this is the frequency at which an algorithm is useful and to which extent. An algorithm that can be very used in most situations will get much more training signals. The relative strength of the reward signal you get is important because of the noise in the training and because of catastrophic forgetting.
LLMs are not learning complex algorithms yet. They are learning something like a world model because this is used for most tasks and it can be built by first building each part separately and then assembling them.
Regarding building algorithms to exploit this world model, it can be learned later if the algorithm is composed first of very simple algorithms that can be later assembled. An extra difficulty for LLMs to learn algorithms is in situations where heuristics already work very well. In that case, you need to add significant regularisation pushing for simpler circuits. Then you may observe
grokkinglearning and a transition from heuristics to algorithms.An issue with this reasoning is that heuristics are 1-step algorithms (0 compoundness).
Effects:
- Frequency of reward
- Strength of the additional reward (above the “heuristic baseline”)
- Compoundness
Forecasting game:
(WIP, mostly a failure at that point)
Early to generalize well:
World models can be built from simple parts, and are most of the time valuable.
Generalizable algorithm for simple and frequent tasks on which heuristics fail dramatically: ??? (maybe) generating random numbers, ??
Medium to generalize well:
Generalizable deceptive alignment algorithms: They require several components to work. But they are useful for many tasks. The strength of the additional reward is not especially high or low.
Generalizable instrumental convergence algorithms: Same as deceptive alignment.
Generalizable short horizon algorithms: They, by definition, require fewer sequential steps, as such they should be less “compounded” functions and appear sooner.
Late:
Generalizable long horizon algorithms: They, by definition, require more sequential steps, as such they should be more “compounded” functions and appear later.
The latest:
Generalizable long horizon narrow capabilities: They are not frequently reinforced.
(Time spent on this: 45min)
July 6th update:
Here is a quick experiment trying to observe the effect of increasing “compoundness” on the ordering of
grokkinglearning different functions: https://colab.research.google.com/drive/1B85mfCkqyQZSl1JGbLr0r5BrAS8LYUr5?usp=sharingQuick results:
The task is predicting the sign of the product of 1 (function 1) to 8 (function 8) standard normal random variables.
Increasing the compoundness by 2 seems to delay the
grokkinglearning by something like 1 OOM.Hm, what do you mean by “generalizable deceptive alignment algorithms”? I understand ‘algorithms for deceptive alignment’ to be algorithms that enable the model to perform well during training because alignment-faking behavior is instrumentally useful for some long-term goal. But that seems to suggest that deceptive alignment would only emerge – and would only be “useful for many tasks” – after the model learns generalizable long-horizon algorithms.
These forecasts are about the order under which functionalities see a jump in their generalization (how far OOD they work well).
By “Generalisable xxx” I meant the form of the functionality xxx that generalize far.