To solve the computer control problem we need to decompose it into parts. This article presents one decomposition.
The primary decomposition is into the problems of isolation, access control and resource allocation. Isolation stops programs from interacting and access controls define how some interactions can be selectively allowed. It is the resource allocation problem where the meat of the computer control problem is, if the user can get the correct resource allocations then they control the activity of the whole system.
One scale-able system worth exploring for resource allocations is a market based solution where programs can bid on resources based on the amount of credit they have. Good programs can get the resources they need over bad programs. If this problem transformation is followed, it changes the question of resource allocation into the credit allocation problem.
Each part of the decomposition is put into context of work in computing, both contemporary and historical.
What type of automated system can we implement to stop a normal general purpose computer system misbehaving (and carry on with its good behaviour ) if it has a malign program in it.
So how do we break this down? What does it mean to misbehave, how can we split up the computer system so that we can talk about problems with different parts of it. Let us start with defining some terms: resources, behaviour and the resources relationship to a program, an allocation.
Resource: A part of the computer system.
Behaviour: Any action within the system upon the resources of the system. These include using energy by calculating, using memory bandwidth, writing something to the display, altering another program. Anything the user might have an opinion on or might have an opinion on the impact of that action.
Allocation: The ability for a program to perform a behaviour in a system.
Isolation problem
The ability to have one program have any sort of control of computer is the isolation problem. If a malign program can rewrite a good program, it is game over. We need to stop one program stomping over rest. This has been studied extensively in computer security. We have moved from having a single program in control of all the resources of a program, through virtual memory to full virtual machines where programs or sets of programs can be almost totally isolated off from each other. Qubes is an example of trying to use virtual machine isolation to make a more secure computer for end users.
Isolation is hard on modern computers for low level cpu resources such as memory bandwidth or layer 3 cache. There may still be work to be done on this problem. It is possible that we will need to get different hardware to solve the isolation problem satisfactorily.
Access control problem
However fully isolated programs are pointless. They have no impact on each other, or on the outside world. This brings us to the subject of access control. There have been two main types of access control identified in computing today, access control lists and capabilities.
Access control lists: each resource maintain a list of programs that can perform behaviours with it. This is the standard way most role based and other access control works.
Capabilities: Programs are given tokens that allow them to perform a behaviour on a resource.
They are not equivalent due to ambient authority. If a program is on an access control list and only its identity is needed to access a resource, then it can be tricked into using that authority in a different context than expected (see the confused deputy problem for more detail). Ambient authority does make the program easier to write as you don’t have to pass the capability token through the system. However the protection against logic problems and some of the efficiencies of implementation mean that I favour capabilities for access control. But it is worth noting that some resources such as processing power are not easily represented in normal object capabilities.
Allocation problem
Isolation and access controls never tell you how the programs should be allocated resources. It just gives you a language to describe the allocations and access. The more detailed and fine grained the allocations the safer the system, but the more overhead the user has in judging which programs gets which allocations. We also want allocations to be temporary, to stop the malign program being given an allocation, so it is insufficient to do these allocations once. We must do it over and over again, as programs are updated become less important or we create new ones.
Relying on humans to create and manage allocations, is not a scale-able solution to the normal computer control problem.
The core of the normal computer control problem is the resource allocation problem: How to create a system to manage the allocation of resources on behalf of the user.
So how do we create the allocations? Programs can’t just ask for allocations. Malign programs could just ask for all the resources. How do we give resources to good programs and not to bad?
Let us cheat for a moment and assume we can figure out how good a program is. We could allocate resources by having programs ask for them and the allocations being determined by how “good” it was. So the best programs got the resources they need. To stop the value of a program being doubly counted they would have to spend that value. Programs might also value different resources differently, so that they might chose to spend their value differently. This is looking a lot like an auction based solution to the resource allocation problem. A project with a similar thrust to this, agorics never got very far. It was mainly interested in monetary credit, but the ideas can be adapted to a virtual currencies. It seems like an interesting space to explore. The exact auction mechanism will be explored later. We are getting off-topic; we still lack a way of allocating this currency to programs.
Credit assignment problem
Allocating currencies per program is also too complicated for a human. Allocating a total value for how well the system is doing seems realistic for a human (although we may end up wanting more signals than just this, eventually). This looks a lot like a reinforcement learning problem. One RL system I’ve taken inspiration from is learning classifier systems. Learning classifier system has a set of rules with a simplified market. The rule that wins control of the system choose the action and gets the reward it spreads the value backwards to previous programs that had control of the actions in a fixed way.
In a full computer system this is a lot more complicated. To convert a utility signal into currency within my system, one resource of the system is the “responsibility” resource. Programs that win the auction for “responsibility” get currency based on feedback from the user the system for that period of allocation. There would have to be a baseline of currency coming into the system with the user pushing it up or down depending upon how the system was doing. The program with “responsibility” would have oversight into the system and see which programs helped them do good and allocate currency to them. Those programs that got this currency could also allocate currency to other programs that directly helped them. In this way value would spread throughout the system, through lots of little decisions spread around.
Conclusion
We’ve had a whistle stop tour of the normal computer control problem and seen some of the possible approaches to the sub problems.
I favour pushing more responsibility for the management of the system into the programs and also decentralizing that responsibility as much as possible between the problems. Programs have to figure out how much they value a specific resources and they have to figure out how much other programs have helped them to give them resources. This can be seen as both delaying the solution and also allowing the system to have flexibility in how it finally solves it.
The big unknowns are: how this sort of system would actually behave in aggregate (what the evolutionary dynamics look like) and the strategies needed for the programs to make the system robust. No doubt the strategies will be become more complex in time, but there needs to be relatively simple strategies that do okay. Else it will be too easy to attack and subvert the system.
If we solve this problem we can look at introducing programs that create new programs and be happy that the new “good” programs will be the ones that survive.
Decomposing the normal computer control problem
To solve the computer control problem we need to decompose it into parts. This article presents one decomposition.
The primary decomposition is into the problems of isolation, access control and resource allocation. Isolation stops programs from interacting and access controls define how some interactions can be selectively allowed. It is the resource allocation problem where the meat of the computer control problem is, if the user can get the correct resource allocations then they control the activity of the whole system.
One scale-able system worth exploring for resource allocations is a market based solution where programs can bid on resources based on the amount of credit they have. Good programs can get the resources they need over bad programs. If this problem transformation is followed, it changes the question of resource allocation into the credit allocation problem.
Each part of the decomposition is put into context of work in computing, both contemporary and historical.
The normal computer control problem was defined as:
So how do we break this down? What does it mean to misbehave, how can we split up the computer system so that we can talk about problems with different parts of it. Let us start with defining some terms: resources, behaviour and the resources relationship to a program, an allocation.
Resource: A part of the computer system.
Behaviour: Any action within the system upon the resources of the system. These include using energy by calculating, using memory bandwidth, writing something to the display, altering another program. Anything the user might have an opinion on or might have an opinion on the impact of that action.
Allocation: The ability for a program to perform a behaviour in a system.
Isolation problem
The ability to have one program have any sort of control of computer is the isolation problem. If a malign program can rewrite a good program, it is game over. We need to stop one program stomping over rest. This has been studied extensively in computer security. We have moved from having a single program in control of all the resources of a program, through virtual memory to full virtual machines where programs or sets of programs can be almost totally isolated off from each other. Qubes is an example of trying to use virtual machine isolation to make a more secure computer for end users.
Isolation is hard on modern computers for low level cpu resources such as memory bandwidth or layer 3 cache. There may still be work to be done on this problem. It is possible that we will need to get different hardware to solve the isolation problem satisfactorily.
Access control problem
However fully isolated programs are pointless. They have no impact on each other, or on the outside world. This brings us to the subject of access control. There have been two main types of access control identified in computing today, access control lists and capabilities.
Access control lists: each resource maintain a list of programs that can perform behaviours with it. This is the standard way most role based and other access control works.
Capabilities: Programs are given tokens that allow them to perform a behaviour on a resource.
They are not equivalent due to ambient authority. If a program is on an access control list and only its identity is needed to access a resource, then it can be tricked into using that authority in a different context than expected (see the confused deputy problem for more detail). Ambient authority does make the program easier to write as you don’t have to pass the capability token through the system. However the protection against logic problems and some of the efficiencies of implementation mean that I favour capabilities for access control. But it is worth noting that some resources such as processing power are not easily represented in normal object capabilities.
Allocation problem
Isolation and access controls never tell you how the programs should be allocated resources. It just gives you a language to describe the allocations and access. The more detailed and fine grained the allocations the safer the system, but the more overhead the user has in judging which programs gets which allocations. We also want allocations to be temporary, to stop the malign program being given an allocation, so it is insufficient to do these allocations once. We must do it over and over again, as programs are updated become less important or we create new ones.
Relying on humans to create and manage allocations, is not a scale-able solution to the normal computer control problem.
The core of the normal computer control problem is the resource allocation problem: How to create a system to manage the allocation of resources on behalf of the user.
So how do we create the allocations? Programs can’t just ask for allocations. Malign programs could just ask for all the resources. How do we give resources to good programs and not to bad?
Let us cheat for a moment and assume we can figure out how good a program is. We could allocate resources by having programs ask for them and the allocations being determined by how “good” it was. So the best programs got the resources they need. To stop the value of a program being doubly counted they would have to spend that value. Programs might also value different resources differently, so that they might chose to spend their value differently. This is looking a lot like an auction based solution to the resource allocation problem. A project with a similar thrust to this, agorics never got very far. It was mainly interested in monetary credit, but the ideas can be adapted to a virtual currencies. It seems like an interesting space to explore. The exact auction mechanism will be explored later. We are getting off-topic; we still lack a way of allocating this currency to programs.
Credit assignment problem
Allocating currencies per program is also too complicated for a human. Allocating a total value for how well the system is doing seems realistic for a human (although we may end up wanting more signals than just this, eventually). This looks a lot like a reinforcement learning problem. One RL system I’ve taken inspiration from is learning classifier systems. Learning classifier system has a set of rules with a simplified market. The rule that wins control of the system choose the action and gets the reward it spreads the value backwards to previous programs that had control of the actions in a fixed way.
In a full computer system this is a lot more complicated. To convert a utility signal into currency within my system, one resource of the system is the “responsibility” resource. Programs that win the auction for “responsibility” get currency based on feedback from the user the system for that period of allocation. There would have to be a baseline of currency coming into the system with the user pushing it up or down depending upon how the system was doing. The program with “responsibility” would have oversight into the system and see which programs helped them do good and allocate currency to them. Those programs that got this currency could also allocate currency to other programs that directly helped them. In this way value would spread throughout the system, through lots of little decisions spread around.
Conclusion
We’ve had a whistle stop tour of the normal computer control problem and seen some of the possible approaches to the sub problems.
I favour pushing more responsibility for the management of the system into the programs and also decentralizing that responsibility as much as possible between the problems. Programs have to figure out how much they value a specific resources and they have to figure out how much other programs have helped them to give them resources. This can be seen as both delaying the solution and also allowing the system to have flexibility in how it finally solves it.
The big unknowns are: how this sort of system would actually behave in aggregate (what the evolutionary dynamics look like) and the strategies needed for the programs to make the system robust. No doubt the strategies will be become more complex in time, but there needs to be relatively simple strategies that do okay. Else it will be too easy to attack and subvert the system.
If we solve this problem we can look at introducing programs that create new programs and be happy that the new “good” programs will be the ones that survive.