Erm, I am not sure Quirell would show you AI_K, either. The current setting supposes Quirrell that wants you to acept his progams (otherwise, why first round would not be “double_down();”?) and presupposes that immediate safety is obvious.
Maybe adding “hold” option (preserve score, ask for the next program) could improve the setting somewhat.
Well, my idea is that Quirrell’s interest is to teach you a valuable lesson, by showing you a program whose safety is genuinely non-obvious to you, so the first program may or may not be safe; but if it happens to be unsafe and you double down anyway, Quirrell is the type of teacher who thinks you’ll learn the best lesson about your overconfidence if you do suffer the consequences of self-destruction.
That’s just the story, though. What I need formally is a setting where there’s a clear separation between “safe” and “unsafe” rewrites, and our program has to decide whether a proposed rewrite is definitely safe or possibly unsafe. For this toy setting, I wanted that to be the only choice the program has to make, because if there were other choices, you could argue that you should only accept a rewrite if the rewrite is good at making those other choices, and good at only accepting rewrites which are good at these other choices, etc. -- which is a problem that will need to be solved, but not something I wanted to focus on in this post.
The hold option should forever ban Quirrell from offering that exact source code string again (equivalent is fine, just not identical), and also cost some non-zero number of points. Unfortunately, Quirrell can trivially generate a vast array of identical programs, thus making “hold” a problematic choice. I don’t see how to ban that without solving the general program-equivalence problem, which is halting complete.
If holding costs nothing, write “if score > 2^100, walk away, else if p is equivalent to this, double down, else hold”. Then tell Quirrell that you’ll only accept that program for your first move, and will hold until he produces it. Congratulations, you now have an exceedingly boring stalemate.
I don’t see a way to make the hold option interesting.
Erm, I am not sure Quirell would show you AI_K, either. The current setting supposes Quirrell that wants you to acept his progams (otherwise, why first round would not be “double_down();”?) and presupposes that immediate safety is obvious.
Maybe adding “hold” option (preserve score, ask for the next program) could improve the setting somewhat.
Well, my idea is that Quirrell’s interest is to teach you a valuable lesson, by showing you a program whose safety is genuinely non-obvious to you, so the first program may or may not be safe; but if it happens to be unsafe and you double down anyway, Quirrell is the type of teacher who thinks you’ll learn the best lesson about your overconfidence if you do suffer the consequences of self-destruction.
That’s just the story, though. What I need formally is a setting where there’s a clear separation between “safe” and “unsafe” rewrites, and our program has to decide whether a proposed rewrite is definitely safe or possibly unsafe. For this toy setting, I wanted that to be the only choice the program has to make, because if there were other choices, you could argue that you should only accept a rewrite if the rewrite is good at making those other choices, and good at only accepting rewrites which are good at these other choices, etc. -- which is a problem that will need to be solved, but not something I wanted to focus on in this post.
The hold option should forever ban Quirrell from offering that exact source code string again (equivalent is fine, just not identical), and also cost some non-zero number of points. Unfortunately, Quirrell can trivially generate a vast array of identical programs, thus making “hold” a problematic choice. I don’t see how to ban that without solving the general program-equivalence problem, which is halting complete.
If holding costs nothing, write “if score > 2^100, walk away, else if p is equivalent to this, double down, else hold”. Then tell Quirrell that you’ll only accept that program for your first move, and will hold until he produces it. Congratulations, you now have an exceedingly boring stalemate.
I don’t see a way to make the hold option interesting.
Well, as you have pointed out (I mean: http://lesswrong.com/lw/e4e/an_angle_of_attack_on_open_problem_1/7862 ) , we are probably already dealing with non-real-line utilities. So we could just lose one hold point per hold.
Also, we could require Quirrell to present each source code string infinitely many times.
This would remove stalemates of Quirrell not offering some string at all, and would give us some incentive to accept a many programs as we can verify.