In previousposts, I posited AIs caring only about virtual worlds—in fact, being defined as processes in virtual worlds, similarly to cousin_it’s idea. How could this go? We would want the AI to reject offers of outside help—be they ways of modifying its virtual world, or ways of giving it extra resources.
Let V be a virtual world, over which a utility function u is defined. The world accepts a single input string O. Let P be a complete specification of an algorithm, including the virtual machine it is run on, the amount of memory it has access to, and so on.
Fix some threshold T for u (to avoid the the subtleweeds of maximising). Define the statement:
r(P,O,V,T): “P(V) returns O, and either E(u|O)>T or O=∅”
And the string valued program:
Q(V,P,T): “If you can find that there exists a non-empty O such that r(P,O,V,T), return O. Else return ∅.”
Here “find” and “E” are where the magic-super-intelligence-stuff happens.
Now, it seems to me that Q(V,Q,T) is the program we are looking for. It is uninterested in offers to modify the virtual world, because E(u|O)>T is defined over the unmodified virtual world. We can set it up so that the first thing it proves is something like “If I (ie Q) prove E(u|O)>T, then r(Q,O,V,T).” If we offer it more computing resources, it can no longer make use of that assumption, because “I” will no longer be Q.
Does this seem like a possible way of phrasing the self-containing requirements? For the moment, this seems to make it reject small offers of extra resources, and be indifferent to large offers.
How the virtual AI controls itself
A putative new idea for AI control; index here.
In previous posts, I posited AIs caring only about virtual worlds—in fact, being defined as processes in virtual worlds, similarly to cousin_it’s idea. How could this go? We would want the AI to reject offers of outside help—be they ways of modifying its virtual world, or ways of giving it extra resources.
Let V be a virtual world, over which a utility function u is defined. The world accepts a single input string O. Let P be a complete specification of an algorithm, including the virtual machine it is run on, the amount of memory it has access to, and so on.
Fix some threshold T for u (to avoid the the subtle weeds of maximising). Define the statement:
r(P,O,V,T): “P(V) returns O, and either E(u|O)>T or O=∅”
And the string valued program:
Q(V,P,T): “If you can find that there exists a non-empty O such that r(P,O,V,T), return O. Else return ∅.”
Here “find” and “E” are where the magic-super-intelligence-stuff happens.
Now, it seems to me that Q(V,Q,T) is the program we are looking for. It is uninterested in offers to modify the virtual world, because E(u|O)>T is defined over the unmodified virtual world. We can set it up so that the first thing it proves is something like “If I (ie Q) prove E(u|O)>T, then r(Q,O,V,T).” If we offer it more computing resources, it can no longer make use of that assumption, because “I” will no longer be Q.
Does this seem like a possible way of phrasing the self-containing requirements? For the moment, this seems to make it reject small offers of extra resources, and be indifferent to large offers.