Hm. In his talk, Eliezer explains that PA(n) for some large n does not solve the problem of how an AI should reason when rewriting itself, and explains why just saying “if you can prove something in this system then it’s true” doesn’t work. My reason for designing Quirrell’s game was to have a crisp setting in which the PA(n) hierarchy fails, so that I could test whether my PA_K idea could solve it. So...
This is not crisp, but I think my requirement would be that if we replace Quirrell by the AI’s untrusted mathematical intuition module proposing rewrites of the AI, then the solution should still be doing something useful. A quining solution would only accept a proposed rewrite if the proposed new source is exactly identical to the old source, which—isn’t helpful?
Allow unlimited “hold”, deduct “hold points” for it, oblige Quirrell to offer any program infinitely many times and double the score only if the new program is note equivalent to any of the previous ones (otherwise, replace without altering score)? Maybe “hold” should work the same, too (no penalty if equivalent program was once rejected).
Hm. In his talk, Eliezer explains that PA(n) for some large n does not solve the problem of how an AI should reason when rewriting itself, and explains why just saying “if you can prove something in this system then it’s true” doesn’t work. My reason for designing Quirrell’s game was to have a crisp setting in which the PA(n) hierarchy fails, so that I could test whether my PA_K idea could solve it. So...
This is not crisp, but I think my requirement would be that if we replace Quirrell by the AI’s untrusted mathematical intuition module proposing rewrites of the AI, then the solution should still be doing something useful. A quining solution would only accept a proposed rewrite if the proposed new source is exactly identical to the old source, which—isn’t helpful?
Allow unlimited “hold”, deduct “hold points” for it, oblige Quirrell to offer any program infinitely many times and double the score only if the new program is note equivalent to any of the previous ones (otherwise, replace without altering score)? Maybe “hold” should work the same, too (no penalty if equivalent program was once rejected).
I really don’t think incorporating halting-complete problems into the rules is a step forward.
Well, you can replace it with “equal as a string”, and it is only about scoring anyway.