Some vague ideas about decision theory math floating in my head right now. Posting them in this raw state because my progress is painfully slow and maybe someone will have the insight that I’m missing.
1) thescoundrel has suggested that spurious counterfactuals can be defined as counterfactuals with long proofs. How far can we push this? Can there be a “complexity-based decision theory”?
2) Can we write a version of this program that would reject at least some spurious proofs?
3) Define problem P1 as “output an action that maximizes utility”, and P2 as “output a program that solves P1″. Can we write a general enough agent that solves P1 correctly, and outputs its own source code as the answer to P2? To stop the agent from solving P1 as part of solving P2, we can add a resource restriction to P2 but not P1. This is similar to Eliezer’s “AI reflection problem”.
def P1():
sumU = 0;
for(#U=1; #U<3^^^3; #U++):
if(#U encodes a well-defined boundedly-recursive parameterless function,
that calls an undefined single-parameter function "A" with #U as a parameter):
sumU += eval(#U + #A)
return sumU
def P2():
sumU = 0;
for(#U=1; #U<3^^^3; #U++):
if(#U encodes a well-defined boundedly-recursive parameterless function that calls
an undefined single-parameter function "A" with #U as a parameter):
code = A(#P2)
sumU += eval(#U + code)
return sumU
def A(#U):
Enumerate proofs by length L = 1 ... INF
if found any proof of the form "A()==a implies eval(#U + #A)==u, and A()!=a implies eval(#U + #A)<=u"
break;
Enumerate proofs by length up to L+1 (or more)
if found a proof that A()!=x
return x
return a
Although A(#P2) won’t return #A, I think eval(A(#P2)(#P2)) will return A(#P2), which will therefore be the answer to the reflection problem.
2) Can we write a version of this program that would reject at least some spurious proofs?
It’s trivial to do at least some:
def A(P):
if P is a valid proof that A(P)==a implies U()==u, and A(P)!=a implies U()<=u
and P does not contain a proof step "A(P)=x" or "A(P)!=x" for any x:
return a
else:
do whatever
Sure, but that’s too trivial for my taste :-( You understand the intent of the question, right? It doesn’t call for “an answer”, it calls for ideas that might lead toward “the answer”.
To tell the truth, I just wanted to write something, to generate some activity. The original post seems important and useful, in that it states several well-defined and interesting problems. Seeing it staying alone in a relative obscurity of an Open Thread even for a day was a little disheartening :)
Some vague ideas about decision theory math floating in my head right now. Posting them in this raw state because my progress is painfully slow and maybe someone will have the insight that I’m missing.
1) thescoundrel has suggested that spurious counterfactuals can be defined as counterfactuals with long proofs. How far can we push this? Can there be a “complexity-based decision theory”?
2) Can we write a version of this program that would reject at least some spurious proofs?
3) Define problem P1 as “output an action that maximizes utility”, and P2 as “output a program that solves P1″. Can we write a general enough agent that solves P1 correctly, and outputs its own source code as the answer to P2? To stop the agent from solving P1 as part of solving P2, we can add a resource restriction to P2 but not P1. This is similar to Eliezer’s “AI reflection problem”.
Thoughts on problem 3:
Although A(#P2) won’t return #A, I think eval(A(#P2)(#P2)) will return A(#P2), which will therefore be the answer to the reflection problem.
It’s trivial to do at least some:
Sure, but that’s too trivial for my taste :-( You understand the intent of the question, right? It doesn’t call for “an answer”, it calls for ideas that might lead toward “the answer”.
To tell the truth, I just wanted to write something, to generate some activity. The original post seems important and useful, in that it states several well-defined and interesting problems. Seeing it staying alone in a relative obscurity of an Open Thread even for a day was a little disheartening :)