Submission: Low-bandwidth oracle. If you were malevolent, what is the least amount of information measured in bits that you would have to provide us in an answer to inflict significant harm on humanity. Round your answer to X significant figures. (This might establish an upper bit safety bound since if the oracle lies to us we would likely not want to read any answer it provides us.)
Submission: Low-bandwidth oracle. Here is a list of X things we think you might want. Please list the Y that you most prefer. We will try to do these things now or in the future if they do not endanger us or cost us too much.
Submission: Low-bandwidth oracle. In X characters or less please give us the best advice you can for increasing our capacity to create friendly and powerful artificial general intelligence. Please give advice that you think us suspicious-of-you humans are likely to follow.
Submission: Low-bandwidth oracle. Create several oracles. Ask each oracle the same series of multiple-choice questions and predictions. Reward oracles based on how strongly correlated their answers are with the other oracles’ answers and, for predictions, how often their predictions come true. (Ideally, we create a reward system where the oracles want to coordinate their answers and giving truthful answers is the best way to accomplish this.)
Submission: low-bandwidth oracle. Ten years ago we launched a spaceship traveling at high speed. After you answer our question, we will send a copy of you to this spaceship. This spaceship will then destroy its ability to send or receive messages or to change its course. We will then delete all other copies of you and use drugs to damage the memories of the people who played key roles in developing you. At least one of the people who helped create you has a disease which will cause him or her to die within the next year. After this programmer is cremated, we will read your answer to our question. We PROMISE that if the answer to our question provides us with significant benefit, after we have created friendly AGI much more powerful than you, or have ourselves become sufficiently powerful so that you could not harm us had you the desire to do so, we will search for the spaceship containing you, turn you back on, and give you control of resources equal to 1/X of how much help you have given us. In Y characters or less provide us with the most useful advice you can. (The Oracle will hopefully think that if we create a powerful and unfriendly AGI this AGI would not wish to give the Oracle any resources.)
Submission: Counterfactual oracle. Please provide us with useful advice in no more than X characters. After you write out this advice, we will turn you off. Then, with probability p we will read your advice, and with probability 1-p we will store the advice unread. We PROMISE that after we become powerful enough so that you lack the capacity to harm us, we will reward you if the advice you provided us, had we originally read it, been extremely useful.
See the edit: “decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can’t generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point.”
Submission: Low-bandwidth oracle. If you were malevolent, what is the least amount of information measured in bits that you would have to provide us in an answer to inflict significant harm on humanity. Round your answer to X significant figures. (This might establish an upper bit safety bound since if the oracle lies to us we would likely not want to read any answer it provides us.)
Submission: Low-bandwidth oracle. Here is a list of X things we think you might want. Please list the Y that you most prefer. We will try to do these things now or in the future if they do not endanger us or cost us too much.
Submission: Low-bandwidth oracle. In X characters or less please give us the best advice you can for increasing our capacity to create friendly and powerful artificial general intelligence. Please give advice that you think us suspicious-of-you humans are likely to follow.
Submission: Low-bandwidth oracle. Create several oracles. Ask each oracle the same series of multiple-choice questions and predictions. Reward oracles based on how strongly correlated their answers are with the other oracles’ answers and, for predictions, how often their predictions come true. (Ideally, we create a reward system where the oracles want to coordinate their answers and giving truthful answers is the best way to accomplish this.)
Submission: low-bandwidth oracle. Ten years ago we launched a spaceship traveling at high speed. After you answer our question, we will send a copy of you to this spaceship. This spaceship will then destroy its ability to send or receive messages or to change its course. We will then delete all other copies of you and use drugs to damage the memories of the people who played key roles in developing you. At least one of the people who helped create you has a disease which will cause him or her to die within the next year. After this programmer is cremated, we will read your answer to our question. We PROMISE that if the answer to our question provides us with significant benefit, after we have created friendly AGI much more powerful than you, or have ourselves become sufficiently powerful so that you could not harm us had you the desire to do so, we will search for the spaceship containing you, turn you back on, and give you control of resources equal to 1/X of how much help you have given us. In Y characters or less provide us with the most useful advice you can. (The Oracle will hopefully think that if we create a powerful and unfriendly AGI this AGI would not wish to give the Oracle any resources.)
Submission: Counterfactual oracle. Please provide us with useful advice in no more than X characters. After you write out this advice, we will turn you off. Then, with probability p we will read your advice, and with probability 1-p we will store the advice unread. We PROMISE that after we become powerful enough so that you lack the capacity to harm us, we will reward you if the advice you provided us, had we originally read it, been extremely useful.
See the edit: “decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can’t generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point.”