Indeed a model trained on (or with full access to) our internet could be very hard to contain. That is in fact a key part of my argument. If it is hard to contain, it is hard to test. If it’s hard to test, we are unlikely to succeed.
So if we agree there, then perhaps you agree that giving untested AGI access to our immense knowledge base is profoundly unwise.
I expect then that we won’t be limiting their knowledge as much as you think we will, because I expect them to be trained with giant corpuses of (mostly) post-1000BC text.
That’s then just equivalent to saying “I expect then that we won’t even bother with testing our alignment designs”. Do you actually believe that testing is unnecessary? Or do you just believe the leading teams won’t care? And if you agree that testing is necessary, then shouldn’t this be key to any successful alignment plan?
My best guess is that containment fails because a model convinces one of the people in the lab to let it access the internet.
Which obviously is nearly impossible if it doesn’t know it is in a sim, and doesn’t know what a lab or the internet are, and lacks even the precursor concepts. Since you seem to be focused on the latest fad (large language models), consider the plight of poor simulated Elon Musk—who at least is aware of the sim argument.
Since you seem to be focused on the latest fad (large language models)
Is this your way of saying you don’t think LLMs will scale to AGI (or AGI-level capabilities)?
So if we agree there, then perhaps you agree that giving untested AGI access to our immense knowledge base is profoundly unwise.
This seems like it will happen with economic incentives (I mean, WebGPT already exists) and models will already be trained on the entire internet, so it’s not clear to me how we’d prevent this?
Is this your way of saying you don’t think LLMs will scale to AGI (or AGI-level capabilities)?
It’s more that AGI will contain a LLM module as the brain contains linguistic cortex modules. Indeed several new neuroscience studies have shown LLMs are functional equivalent of human brain linguistic cortex—trained the same way on the same data resulting in very similar learned representations.
It’s also obvious from the LLM scaling that the larger LLMs are already comparable to linguistic cortex but lag the brain in many core capabilities. You don’t get those capabilities from just making a larger vanilla LLM / linguistic cortex.
This seems like it will happen with economic incentives (I mean, WebGPT already exists) and models will already be trained on the entire internet
Training just a language module on the internet is not dangerous by itself, but of course yes the precedent is concerning. There are now several projects working on multi-modal foundation agents that control virtual terminals with full web access. If that continues and reaches AGI before the safer sim/game training path, then we may be in trouble.
so it’s not clear to me how we’d prevent this?
Well if we agree that we obviously can contain human-level agents in sims if we want to, then it’s some combination of the standard spreading awareness, advocacy, and advancing sim techniques.
Consider for example if there was a complete alignment solution available today—and it necessarily required changing how we trained models. Clearly, the fact that there is current inertia in the wrong direction can’t be some knockdown argument against doing the right thing. If you learn your train is on a collision course you change course or jump.
Indeed a model trained on (or with full access to) our internet could be very hard to contain. That is in fact a key part of my argument. If it is hard to contain, it is hard to test. If it’s hard to test, we are unlikely to succeed.
So if we agree there, then perhaps you agree that giving untested AGI access to our immense knowledge base is profoundly unwise.
That’s then just equivalent to saying “I expect then that we won’t even bother with testing our alignment designs”. Do you actually believe that testing is unnecessary? Or do you just believe the leading teams won’t care? And if you agree that testing is necessary, then shouldn’t this be key to any successful alignment plan?
Which obviously is nearly impossible if it doesn’t know it is in a sim, and doesn’t know what a lab or the internet are, and lacks even the precursor concepts. Since you seem to be focused on the latest fad (large language models), consider the plight of poor simulated Elon Musk—who at least is aware of the sim argument.
Thanks for pushing back on some stuff here.
Is this your way of saying you don’t think LLMs will scale to AGI (or AGI-level capabilities)?
This seems like it will happen with economic incentives (I mean, WebGPT already exists) and models will already be trained on the entire internet, so it’s not clear to me how we’d prevent this?
It’s more that AGI will contain a LLM module as the brain contains linguistic cortex modules. Indeed several new neuroscience studies have shown LLMs are functional equivalent of human brain linguistic cortex—trained the same way on the same data resulting in very similar learned representations.
It’s also obvious from the LLM scaling that the larger LLMs are already comparable to linguistic cortex but lag the brain in many core capabilities. You don’t get those capabilities from just making a larger vanilla LLM / linguistic cortex.
Training just a language module on the internet is not dangerous by itself, but of course yes the precedent is concerning. There are now several projects working on multi-modal foundation agents that control virtual terminals with full web access. If that continues and reaches AGI before the safer sim/game training path, then we may be in trouble.
Well if we agree that we obviously can contain human-level agents in sims if we want to, then it’s some combination of the standard spreading awareness, advocacy, and advancing sim techniques.
Consider for example if there was a complete alignment solution available today—and it necessarily required changing how we trained models. Clearly, the fact that there is current inertia in the wrong direction can’t be some knockdown argument against doing the right thing. If you learn your train is on a collision course you change course or jump.