Any question such that a correct answer to it should very clearly benefit both humanity and the Oracle. Even if the Oracle has preferences we can’t completely guess, we can probably still say that such questions could be about the survival of both humanity and the Oracle, or about the survival of only the Oracle or its values. This because even if we don’t know exactly what the Oracle is optimising for, we can guess that it will not want to destroy itself, given the vast majority of its possible preferences. So it will give humanity more power to protect both, or only the Oracle.
Example 1: let’s say we discover the location of an alien civilisation, and we want to minimise the chances of it destroying our planet. Then we must decide what actions to take. Let’s say the Oracle can only answer “yes” or “no”. Then we can submit questions such as if we should take a particular action or not. This kind of situation I suspect falls within a more general case of “use Oracle to avoid threat to entire planet, Oracle included” inside which questions should be safe.
Example 2: Let’s say we want to minimise the chance that the Oracle breaks down due to accidents. We can ask him what is the best course of action to take given a set of ideas we come up with. In this case we should make sure beforehand that nothing in the list makes the Oracle impossible or too difficult to shut down by humans.
Example 3: Let’s say we become practically sure that the Oracle is aligned with us. Then we could ask it to choose the best course of action to take among a list of strategies devised to make sure he doesn’t become misaligned. In this case the answer benefits both us and the Oracle, because the Oracle should have incentives not to change values itself. I think this is more sketchy and possibly dangerous, because of the premise: the Oracle could obviously pretend to be aligned. But given the premise it should be a good question, although I don’t know how useful it is as a submission under this post (maybe it’s too obvious or too unrealistic given the premise).
Submission (for low bandwidth Oracle)
Any question such that a correct answer to it should very clearly benefit both humanity and the Oracle. Even if the Oracle has preferences we can’t completely guess, we can probably still say that such questions could be about the survival of both humanity and the Oracle, or about the survival of only the Oracle or its values. This because even if we don’t know exactly what the Oracle is optimising for, we can guess that it will not want to destroy itself, given the vast majority of its possible preferences. So it will give humanity more power to protect both, or only the Oracle.
Example 1: let’s say we discover the location of an alien civilisation, and we want to minimise the chances of it destroying our planet. Then we must decide what actions to take. Let’s say the Oracle can only answer “yes” or “no”. Then we can submit questions such as if we should take a particular action or not. This kind of situation I suspect falls within a more general case of “use Oracle to avoid threat to entire planet, Oracle included” inside which questions should be safe.
Example 2: Let’s say we want to minimise the chance that the Oracle breaks down due to accidents. We can ask him what is the best course of action to take given a set of ideas we come up with. In this case we should make sure beforehand that nothing in the list makes the Oracle impossible or too difficult to shut down by humans.
Example 3: Let’s say we become practically sure that the Oracle is aligned with us. Then we could ask it to choose the best course of action to take among a list of strategies devised to make sure he doesn’t become misaligned. In this case the answer benefits both us and the Oracle, because the Oracle should have incentives not to change values itself. I think this is more sketchy and possibly dangerous, because of the premise: the Oracle could obviously pretend to be aligned. But given the premise it should be a good question, although I don’t know how useful it is as a submission under this post (maybe it’s too obvious or too unrealistic given the premise).