I think one problem with this concept is that the “restrictions” might turn out to be very onerous, preventing the good guys (using “restrictions) from winning a complete unilateral victory over everyone else. One of the major anticipated benefits of superhuman AI systems is the ability to work effectively even on vague, broad, difficult tasks that span multiple different domains. If you are committed to creating a totally air-gapped high-security system, where you only hand your AI the “smallest subdividable subtask” and only giving your AI access to a small amount of sanitized context, probably you will end up losing (militarily, economically, whatever) to someone who uses AI in a less-restricted context (even if their AI model is somewhat worse).
So, it seems like, if you think alignment is impossible and the “restriction” path is the only way, you shouldn’t be imagining getting lots of AI help (combat drones, etc), since in any scenarios where you’ve got AI help, your non-restrictionist opponents probably have EVEN MORE AI help. So you might as well just launch your global takeover today, when AI is weak, since your military/economic/etc advantage will probably only erode with every advance in AI capabilities.
I agree with this criticism. What you have done in this design is to create a large bureaucracy of AI systems who essentially will not respond when anything unexpected happens (input outside training distribution) and who are inflexible, anything other than the task assigned at the moment is “not my job/not my problem”. They can have superintelligent subtask performance and the training set can include all available video on earth so they can respond to any situation they have ever seen humans performing, so it’s not as inflexible as it might sound. This is going to work extremely well I think compared to what we have now.
But yes, if this doesn’t allow you to get close to what the limits are for what intelligence allows you to do, “unrestricted” systems might win. It depends.
As a systems engineer myself I don’t see unrestricted systems going anywhere, the issue isn’t that they could be cognitively capable of a lot, it’s that in the near term when you try to use them they will make too many mistakes to trust them with anything that matters. And they are uncorrectable errors, without a structure like described there is a lot of design coupling and making the system better at one thing with feedback comes at a cost elsewhere etc.
It’s easy to talk about an AI system that has some enormous architecture with a thousand modules more like a brain, and it learns online from all the tasks it is doing. Hell it has a module editor so it can add additional whenever it chooses.
But...how do you validate or debug such a system? It’s learning from all inputs, it’s a constantly changing technological artifact. In practice this is infeasible, when it makes a catastrophic error there is nothing you can do to fix it. Any test set you add to train it on the scenario it made a mistake on is not guaranteed to fix the error because the system is ever evolving...
Forget alignment, getting such a system to reliably drive a garbage truck would be risky.
I can’t deny such a system might work, however...
Well, crap. It might work extremely well and outperform everything else, becoming more and more unauditable over time. Because we humans would apply simulated tests for performance as constrains as well as real world kpis. “Add whatever modules to yourself however you want, just ace these tests and do well on your kpis and we don’t care how you do it...”
That’s precisely how you arrive a machine that is internally unrestricted and thus potentially an existential threat.
I think one problem with this concept is that the “restrictions” might turn out to be very onerous, preventing the good guys (using “restrictions) from winning a complete unilateral victory over everyone else. One of the major anticipated benefits of superhuman AI systems is the ability to work effectively even on vague, broad, difficult tasks that span multiple different domains. If you are committed to creating a totally air-gapped high-security system, where you only hand your AI the “smallest subdividable subtask” and only giving your AI access to a small amount of sanitized context, probably you will end up losing (militarily, economically, whatever) to someone who uses AI in a less-restricted context (even if their AI model is somewhat worse).
So, it seems like, if you think alignment is impossible and the “restriction” path is the only way, you shouldn’t be imagining getting lots of AI help (combat drones, etc), since in any scenarios where you’ve got AI help, your non-restrictionist opponents probably have EVEN MORE AI help. So you might as well just launch your global takeover today, when AI is weak, since your military/economic/etc advantage will probably only erode with every advance in AI capabilities.
I agree with this criticism. What you have done in this design is to create a large bureaucracy of AI systems who essentially will not respond when anything unexpected happens (input outside training distribution) and who are inflexible, anything other than the task assigned at the moment is “not my job/not my problem”. They can have superintelligent subtask performance and the training set can include all available video on earth so they can respond to any situation they have ever seen humans performing, so it’s not as inflexible as it might sound. This is going to work extremely well I think compared to what we have now.
But yes, if this doesn’t allow you to get close to what the limits are for what intelligence allows you to do, “unrestricted” systems might win. It depends.
As a systems engineer myself I don’t see unrestricted systems going anywhere, the issue isn’t that they could be cognitively capable of a lot, it’s that in the near term when you try to use them they will make too many mistakes to trust them with anything that matters. And they are uncorrectable errors, without a structure like described there is a lot of design coupling and making the system better at one thing with feedback comes at a cost elsewhere etc.
It’s easy to talk about an AI system that has some enormous architecture with a thousand modules more like a brain, and it learns online from all the tasks it is doing. Hell it has a module editor so it can add additional whenever it chooses.
But...how do you validate or debug such a system? It’s learning from all inputs, it’s a constantly changing technological artifact. In practice this is infeasible, when it makes a catastrophic error there is nothing you can do to fix it. Any test set you add to train it on the scenario it made a mistake on is not guaranteed to fix the error because the system is ever evolving...
Forget alignment, getting such a system to reliably drive a garbage truck would be risky.
I can’t deny such a system might work, however...
Well, crap. It might work extremely well and outperform everything else, becoming more and more unauditable over time. Because we humans would apply simulated tests for performance as constrains as well as real world kpis. “Add whatever modules to yourself however you want, just ace these tests and do well on your kpis and we don’t care how you do it...”
That’s precisely how you arrive a machine that is internally unrestricted and thus potentially an existential threat.