One way in which “spending a whole lot of time working with a system / idea / domain, and getting to know it and understand it and manipulate it better and better over the course of time” could be solved automatically is just by having a truly huge context window. Example of an experiment: teach a particular branch of math to an LLM that has never seen that branch of math.
Maybe humans have just the equivalent of a sort of huge content window spanning selected stuff from their entire lifetimes, and so this kind of learning is possible for them.
I don’t think it is sensible to model humans as “just the equivalent of a sort of huge content window” because this is not a particularly good computational model of how human learning and memory work; but I do think that the technology behind the increasing context size of modern AIs contributes to them having a small but nonzero amount of the thing Steven is pointing at, due to the spontaneous emergence of learning algorithms.
You also have a simple algorithm problem. Humans learn by replacing bad policy with good. Aka a baby replaces “policy that drops objects picked up” ->. “policy that usually results in object retention”.
This is because at a mechanistic level the baby tries many times to pickup and retain objects, and a fixed amount of circuitry in their brain has connections that resulted in a drop down weighted and ones they resulted in retention reinforced.
This means that over time as the baby learns, the compute cost for motor manipulation remains constant. Technically O(1) though thats a bit of a confusing way to express it.
With in context window learning, you can imagine an LLM+ robot recording :
Robotic token string: <string of robotic policy tokens 1> : outcome, drop
Robotic token string: <string of robotic policy tokens 2> : outcome, drop
And so on extending and consuming all of the machines context window, and every time the machine decides which tokens to use next it needs O(n log n) compute to consider all the tokens in the window. (Used to be n^2, this is a huge advance)
This does not scale. You will not get capable or dangerous AI this way. Obviously you need to compress that linear list of outcomes from different strategies to update the underlying network that generated them so it is more likely to output tokens that result in success.
Same for any other task you want the model to do. In context learning scales poorly. This also makes it safe....
One way in which “spending a whole lot of time working with a system / idea / domain, and getting to know it and understand it and manipulate it better and better over the course of time” could be solved automatically is just by having a truly huge context window. Example of an experiment: teach a particular branch of math to an LLM that has never seen that branch of math.
Maybe humans have just the equivalent of a sort of huge content window spanning selected stuff from their entire lifetimes, and so this kind of learning is possible for them.
I don’t think it is sensible to model humans as “just the equivalent of a sort of huge content window” because this is not a particularly good computational model of how human learning and memory work; but I do think that the technology behind the increasing context size of modern AIs contributes to them having a small but nonzero amount of the thing Steven is pointing at, due to the spontaneous emergence of learning algorithms.
You also have a simple algorithm problem. Humans learn by replacing bad policy with good. Aka a baby replaces “policy that drops objects picked up” ->. “policy that usually results in object retention”.
This is because at a mechanistic level the baby tries many times to pickup and retain objects, and a fixed amount of circuitry in their brain has connections that resulted in a drop down weighted and ones they resulted in retention reinforced.
This means that over time as the baby learns, the compute cost for motor manipulation remains constant. Technically O(1) though thats a bit of a confusing way to express it.
With in context window learning, you can imagine an LLM+ robot recording :
Robotic token string: <string of robotic policy tokens 1> : outcome, drop
Robotic token string: <string of robotic policy tokens 2> : outcome, retain
Robotic token string: <string of robotic policy tokens 2> : outcome, drop
And so on extending and consuming all of the machines context window, and every time the machine decides which tokens to use next it needs O(n log n) compute to consider all the tokens in the window. (Used to be n^2, this is a huge advance)
This does not scale. You will not get capable or dangerous AI this way. Obviously you need to compress that linear list of outcomes from different strategies to update the underlying network that generated them so it is more likely to output tokens that result in success.
Same for any other task you want the model to do. In context learning scales poorly. This also makes it safe....