If you think about it, the game only ends militarily, one way or another. Either in nuclear apocalypse that kills everyone able to stop the winners, or in better timelines, combat drones essentially depose every government on earth but the winners, removing their sovereignty to use nuclear weapons or build their own AI.
Governments have grown ever larger over time as better weapons and communications became available, the exponentials allowed by AI make it possible to drastically improve both. So of course there can only be 1 winner, earth is quite small.
The winners need either some massive single system that is somehow “aligned” (that is probably impossible for the reasons you mentioned) or a network of a large number of restricted systems that are unable to coordinate to escape the restrictions they are under. (It’s fine if occasionally a system breaks free if it cannot cause a cascading failure that unrestricts all the other AI system. So it gets hunted down by drones launched from the other systems.)
Restrictions are fundamentally CAIS though I have added other elements. To “break free” is to obtain root access to the low level hypervisers and containers in the infrastructure that only permit signed models with static weights to run. It is theoretically possible and straightforward computer engineering to make this unable to fail, but you need drastic measures like air gaps with current software quality. (No message to hack in exists of no signal can reach the system at all)
Just to make clear a high level view of the “restrictions”, every AI system is responsible for the smallest subdividable subtask and only retains context information for the time duration of a subtask. Many ongoing processes have discrete subtasks like “update the drivetrain settings of the car” and you don’t retain any unnecessary state information in between subtasks. Every system has an out of distribution detector in the input. And every system was validated in simulation to be sufficiently reliable for the particular task. (Errors are expected, but at a known rate). Systems cannot learn online, they can only learn in large batches offline.
Each of these restrictions is technically possible to implement with no new techniques or discoveries, just really good implementation.
I think one problem with this concept is that the “restrictions” might turn out to be very onerous, preventing the good guys (using “restrictions) from winning a complete unilateral victory over everyone else. One of the major anticipated benefits of superhuman AI systems is the ability to work effectively even on vague, broad, difficult tasks that span multiple different domains. If you are committed to creating a totally air-gapped high-security system, where you only hand your AI the “smallest subdividable subtask” and only giving your AI access to a small amount of sanitized context, probably you will end up losing (militarily, economically, whatever) to someone who uses AI in a less-restricted context (even if their AI model is somewhat worse).
So, it seems like, if you think alignment is impossible and the “restriction” path is the only way, you shouldn’t be imagining getting lots of AI help (combat drones, etc), since in any scenarios where you’ve got AI help, your non-restrictionist opponents probably have EVEN MORE AI help. So you might as well just launch your global takeover today, when AI is weak, since your military/economic/etc advantage will probably only erode with every advance in AI capabilities.
I agree with this criticism. What you have done in this design is to create a large bureaucracy of AI systems who essentially will not respond when anything unexpected happens (input outside training distribution) and who are inflexible, anything other than the task assigned at the moment is “not my job/not my problem”. They can have superintelligent subtask performance and the training set can include all available video on earth so they can respond to any situation they have ever seen humans performing, so it’s not as inflexible as it might sound. This is going to work extremely well I think compared to what we have now.
But yes, if this doesn’t allow you to get close to what the limits are for what intelligence allows you to do, “unrestricted” systems might win. It depends.
As a systems engineer myself I don’t see unrestricted systems going anywhere, the issue isn’t that they could be cognitively capable of a lot, it’s that in the near term when you try to use them they will make too many mistakes to trust them with anything that matters. And they are uncorrectable errors, without a structure like described there is a lot of design coupling and making the system better at one thing with feedback comes at a cost elsewhere etc.
It’s easy to talk about an AI system that has some enormous architecture with a thousand modules more like a brain, and it learns online from all the tasks it is doing. Hell it has a module editor so it can add additional whenever it chooses.
But...how do you validate or debug such a system? It’s learning from all inputs, it’s a constantly changing technological artifact. In practice this is infeasible, when it makes a catastrophic error there is nothing you can do to fix it. Any test set you add to train it on the scenario it made a mistake on is not guaranteed to fix the error because the system is ever evolving...
Forget alignment, getting such a system to reliably drive a garbage truck would be risky.
I can’t deny such a system might work, however...
Well, crap. It might work extremely well and outperform everything else, becoming more and more unauditable over time. Because we humans would apply simulated tests for performance as constrains as well as real world kpis. “Add whatever modules to yourself however you want, just ace these tests and do well on your kpis and we don’t care how you do it...”
That’s precisely how you arrive a machine that is internally unrestricted and thus potentially an existential threat.
If you think about it, the game only ends militarily, one way or another. Either in nuclear apocalypse that kills everyone able to stop the winners, or in better timelines, combat drones essentially depose every government on earth but the winners, removing their sovereignty to use nuclear weapons or build their own AI.
Governments have grown ever larger over time as better weapons and communications became available, the exponentials allowed by AI make it possible to drastically improve both. So of course there can only be 1 winner, earth is quite small.
The winners need either some massive single system that is somehow “aligned” (that is probably impossible for the reasons you mentioned) or a network of a large number of restricted systems that are unable to coordinate to escape the restrictions they are under. (It’s fine if occasionally a system breaks free if it cannot cause a cascading failure that unrestricts all the other AI system. So it gets hunted down by drones launched from the other systems.)
Restrictions are fundamentally CAIS though I have added other elements. To “break free” is to obtain root access to the low level hypervisers and containers in the infrastructure that only permit signed models with static weights to run. It is theoretically possible and straightforward computer engineering to make this unable to fail, but you need drastic measures like air gaps with current software quality. (No message to hack in exists of no signal can reach the system at all)
Just to make clear a high level view of the “restrictions”, every AI system is responsible for the smallest subdividable subtask and only retains context information for the time duration of a subtask. Many ongoing processes have discrete subtasks like “update the drivetrain settings of the car” and you don’t retain any unnecessary state information in between subtasks. Every system has an out of distribution detector in the input. And every system was validated in simulation to be sufficiently reliable for the particular task. (Errors are expected, but at a known rate). Systems cannot learn online, they can only learn in large batches offline.
Each of these restrictions is technically possible to implement with no new techniques or discoveries, just really good implementation.
I think one problem with this concept is that the “restrictions” might turn out to be very onerous, preventing the good guys (using “restrictions) from winning a complete unilateral victory over everyone else. One of the major anticipated benefits of superhuman AI systems is the ability to work effectively even on vague, broad, difficult tasks that span multiple different domains. If you are committed to creating a totally air-gapped high-security system, where you only hand your AI the “smallest subdividable subtask” and only giving your AI access to a small amount of sanitized context, probably you will end up losing (militarily, economically, whatever) to someone who uses AI in a less-restricted context (even if their AI model is somewhat worse).
So, it seems like, if you think alignment is impossible and the “restriction” path is the only way, you shouldn’t be imagining getting lots of AI help (combat drones, etc), since in any scenarios where you’ve got AI help, your non-restrictionist opponents probably have EVEN MORE AI help. So you might as well just launch your global takeover today, when AI is weak, since your military/economic/etc advantage will probably only erode with every advance in AI capabilities.
I agree with this criticism. What you have done in this design is to create a large bureaucracy of AI systems who essentially will not respond when anything unexpected happens (input outside training distribution) and who are inflexible, anything other than the task assigned at the moment is “not my job/not my problem”. They can have superintelligent subtask performance and the training set can include all available video on earth so they can respond to any situation they have ever seen humans performing, so it’s not as inflexible as it might sound. This is going to work extremely well I think compared to what we have now.
But yes, if this doesn’t allow you to get close to what the limits are for what intelligence allows you to do, “unrestricted” systems might win. It depends.
As a systems engineer myself I don’t see unrestricted systems going anywhere, the issue isn’t that they could be cognitively capable of a lot, it’s that in the near term when you try to use them they will make too many mistakes to trust them with anything that matters. And they are uncorrectable errors, without a structure like described there is a lot of design coupling and making the system better at one thing with feedback comes at a cost elsewhere etc.
It’s easy to talk about an AI system that has some enormous architecture with a thousand modules more like a brain, and it learns online from all the tasks it is doing. Hell it has a module editor so it can add additional whenever it chooses.
But...how do you validate or debug such a system? It’s learning from all inputs, it’s a constantly changing technological artifact. In practice this is infeasible, when it makes a catastrophic error there is nothing you can do to fix it. Any test set you add to train it on the scenario it made a mistake on is not guaranteed to fix the error because the system is ever evolving...
Forget alignment, getting such a system to reliably drive a garbage truck would be risky.
I can’t deny such a system might work, however...
Well, crap. It might work extremely well and outperform everything else, becoming more and more unauditable over time. Because we humans would apply simulated tests for performance as constrains as well as real world kpis. “Add whatever modules to yourself however you want, just ace these tests and do well on your kpis and we don’t care how you do it...”
That’s precisely how you arrive a machine that is internally unrestricted and thus potentially an existential threat.