Is it likely possible to find better RL algorithms, assisted by mediocre answers, then use RL algorithms to design heterogeneous cognitive architectures?
Given that humans on their own haven’t yet found these better architectures, humans + imitative AI doesn’t seem like it would find the problem trivial.
And it’s not totally clear that these “better RL” algorithms exist. Especially if you are looking at variations of existing RL, not the space of all possible algorithms. Like maybe something pretty fundamentally new is needed.
There are lots of ways to design all sorts of complicated architectures. The question is how well they work.
I mean this stuff might turn out to work. Or something else might work. I’m not claiming the opposite world isn’t plausible. But this is at least a plausible point to get stuck at.
If you can do this and it works, the RSI continues with diminishing returns each generation as you approach an assymptope limited by compute and data.
Seems like there are 2 asymtotes here.
Crazy smart superintelligence, and still fairly dumb in a lot of ways, not smart enough to make any big improvements. If you have a simple evolutionary algorithm, and a test suite, it could Recursively self improve. Tweaking it’s own mutation rate and child count and other hyperparameters. But it’s not going to invent gradient based methods, just do some parameter tuning on a fairly dumb evolutionary algorithm.
Since robots build compute and collect data, it makes your rate of ASI improvement limited ultimately by your robot production. (Humans stand in as temporary robots until they aren’t meaningfully contributing to the total)
This is kind of true. But by the time there are no big algorithmic wins left, we are in the crazy smart, post singularity regime.
RSI
Is a thing that happens. But it needs quite a lot of intelligence to start. Quite possibly more intelligence than needed to automate most of the economy.
A lot of newcomers may outperform LLM experts as they find better RL algorithms from automated searching.
Possibly. Possibly not. Do these better algorithms exist? Can automated search find them? What kind of automated search is being used? It depends.
Given that humans on their own haven’t yet found these better architectures, humans + imitative AI doesn’t seem like it would find the problem trivial.
Humans on their own already did invent better RL algorithms for optimizing at the network architecture layer.
With NAS, the RL model learns the relationship between [architecture] and [predicted performance]. I’m not sure how much transfer learning is done, but you must sample the problem space many times.
You get a plot of all your tries like above, the red dot is the absolute max for human designed networks by the DL experts at Waymo in 2017.
Summary: I was speculating that a more advanced version of an existing technique might work
Tweaking it’s own mutation rate and child count and other hyperparameters. But it’s not going to invent gradient based methods
It can potentially output any element in it’s library of primitives. The library of primitives you get by the “mediocre answers” approach you criticized to re-implement every machine learning paper without code ever published, and you also port all code papers and test them in a common environment. Also the IT staff you would otherwise need, and other roles, is being filled in by AI.
Is a thing that happens. But it needs quite a lot of intelligence to start. Quite possibly more intelligence than needed to automate most of the economy.
No, it is an improved example of existing RL algorithms, trained on the results from testing network and cognitive architectures on a proxy task suite. The proxy task suite is a series of cheaper to run tests, on smaller networks, that predict the ability of a full scale network on your full AGI/ASI gym.
This is subhuman intelligence, but the RL network (or later hybrid architectures) learns from a broader set of results than humans are.
This is kind of true. But by the time there are no big algorithmic wins left, we are in the crazy smart, post singularity regime.
The above method won’t settle on infinity, remember you are still limited by compute, remember this is 5-10 years from today. You find a stronger AGI/ASI than before.
Since you
a. sampled the possibility space a finite number of times
b. proxy tasks can’t have perfect correlation to full scale scores,
c. by ‘only’ starting with ’every technique ever tried by humans or other AI’s (your primitives library) you are searching a subset of the search space
d. Hardware architecture shrinks the space to algorithms the (training) hardware architecture supports well.
e. The noisiness in available training data limits the effectiveness of any model or cognitive algorithm
f. You can’t train an algorithm larger than you can inference
g. You want an algorithm that supports real time hardware
Well the resulting ASI will only be so powerful. Still, the breadth of training data, lack of hardware errors, and much larger working memory should go pretty far...
Summary: I got over a list of reasons why the described technique won’t find the global maximum which would be the strongest superintelligence the underlying hardware can support. Strength is the harmonic mean of scores on your evaluation benchmark, which is ever growing.
Although I uh forgot a step, I’m describing the initial version of the intelligence search algorithm built by humans, and why is saturates. Later on you just add a test to your ASI gym for the ASI being tested to design a better intelligence, and just give it all the data from the prior runs. Still the ASI is going to be limited by (a, d, e, f, g)
Given that humans on their own haven’t yet found these better architectures, humans + imitative AI doesn’t seem like it would find the problem trivial.
And it’s not totally clear that these “better RL” algorithms exist. Especially if you are looking at variations of existing RL, not the space of all possible algorithms. Like maybe something pretty fundamentally new is needed.
There are lots of ways to design all sorts of complicated architectures. The question is how well they work.
I mean this stuff might turn out to work. Or something else might work. I’m not claiming the opposite world isn’t plausible. But this is at least a plausible point to get stuck at.
Seems like there are 2 asymtotes here.
Crazy smart superintelligence, and still fairly dumb in a lot of ways, not smart enough to make any big improvements. If you have a simple evolutionary algorithm, and a test suite, it could Recursively self improve. Tweaking it’s own mutation rate and child count and other hyperparameters. But it’s not going to invent gradient based methods, just do some parameter tuning on a fairly dumb evolutionary algorithm.
This is kind of true. But by the time there are no big algorithmic wins left, we are in the crazy smart, post singularity regime.
Is a thing that happens. But it needs quite a lot of intelligence to start. Quite possibly more intelligence than needed to automate most of the economy.
Possibly. Possibly not. Do these better algorithms exist? Can automated search find them? What kind of automated search is being used? It depends.
Humans on their own already did invent better RL algorithms for optimizing at the network architecture layer.
https://arxiv.org/pdf/1611.01578.pdf background
https://arxiv.org/pdf/1707.07012.pdf : page 6
With NAS, the RL model learns the relationship between [architecture] and [predicted performance]. I’m not sure how much transfer learning is done, but you must sample the problem space many times.
You get a plot of all your tries like above, the red dot is the absolute max for human designed networks by the DL experts at Waymo in 2017.
Summary: I was speculating that a more advanced version of an existing technique might work
It can potentially output any element in it’s library of primitives. The library of primitives you get by the “mediocre answers” approach you criticized to re-implement every machine learning paper without code ever published, and you also port all code papers and test them in a common environment. Also the IT staff you would otherwise need, and other roles, is being filled in by AI.
No, it is an improved example of existing RL algorithms, trained on the results from testing network and cognitive architectures on a proxy task suite. The proxy task suite is a series of cheaper to run tests, on smaller networks, that predict the ability of a full scale network on your full AGI/ASI gym.
This is subhuman intelligence, but the RL network (or later hybrid architectures) learns from a broader set of results than humans are.
The above method won’t settle on infinity, remember you are still limited by compute, remember this is 5-10 years from today. You find a stronger AGI/ASI than before.
Since you
a. sampled the possibility space a finite number of times
b. proxy tasks can’t have perfect correlation to full scale scores,
c. by ‘only’ starting with ’every technique ever tried by humans or other AI’s (your primitives library) you are searching a subset of the search space
d. Hardware architecture shrinks the space to algorithms the (training) hardware architecture supports well.
e. The noisiness in available training data limits the effectiveness of any model or cognitive algorithm
f. You can’t train an algorithm larger than you can inference
g. You want an algorithm that supports real time hardware
Well the resulting ASI will only be so powerful. Still, the breadth of training data, lack of hardware errors, and much larger working memory should go pretty far...
Summary: I got over a list of reasons why the described technique won’t find the global maximum which would be the strongest superintelligence the underlying hardware can support. Strength is the harmonic mean of scores on your evaluation benchmark, which is ever growing.
Although I uh forgot a step, I’m describing the initial version of the intelligence search algorithm built by humans, and why is saturates. Later on you just add a test to your ASI gym for the ASI being tested to design a better intelligence, and just give it all the data from the prior runs. Still the ASI is going to be limited by (a, d, e, f, g)