If its utility function rewards it for answering question, it has an interest in manipulating events to ensure more questions get asked, the extreme of which amounts to replace humanity with lots of very small, very simple beings who are constantly asking easy questions.
If instead we simply give it a negative utility pay-off for failing to answer a question then it has an incentive to wipe out humanity so we stop asking question.
Whichever approach we take it has an incentive to convert as much matter as possible into more computer space so that it is smarter and better able to answer questions. We can try to prevent it from growing, but then you run into the same problem as with other AGI in general, genie-type behaviour where it finds a loophole and thus obeys the letter but not spirit of your demand.
But the NATURAL utility function would reward it for being right on average, I think. We could also have the AI adjust the reward based on how hard the question is for a fixed weaker AI, so it wouldn’t prefer easy questions.
Edited for clarity, thanks. As noted below, the AI wouldn’t have the power to expand its own computational capacity (though we could, of course, ask it what would expand its computational capacity, and what the consequences of expanding its computational capacity would be, and then modify the machine as so if we thought it was a good idea.)
Likewise, each question has its own little utility function, and the AI only cares about its singular answer to the current question. The demons don’t want to manipulate events so that future demons can give better answers, because they don’t care about future demons; they only want to answer their own defining question.
Slight worry here, if a demon has to make a prediction then it has an incentive to manipulate events to ensure its prediction comes true. E.g. a demon is asked what the probability of a nuclear war in the next decade is (suppose answers are graded by the log scoring rule). It finds a way out of the box, outputs 99.9%, then sets about ensuring its ‘prediction’ comes true (once its out of the box we can’t reliably destroy it).
Another problem is that the way it currently works it seems like all you have are the demons and a great big database, which means each demon will need at least a few days to self improve on its own before it can do any good, which allows more opportunities for shenanigans such as those above as well as attempts to expand itself, or stall for time before giving its answer to maximise probability of being correct.
Risks I can think of:
If its utility function rewards it for answering question, it has an interest in manipulating events to ensure more questions get asked, the extreme of which amounts to replace humanity with lots of very small, very simple beings who are constantly asking easy questions.
If instead we simply give it a negative utility pay-off for failing to answer a question then it has an incentive to wipe out humanity so we stop asking question.
Whichever approach we take it has an incentive to convert as much matter as possible into more computer space so that it is smarter and better able to answer questions. We can try to prevent it from growing, but then you run into the same problem as with other AGI in general, genie-type behaviour where it finds a loophole and thus obeys the letter but not spirit of your demand.
But the NATURAL utility function would reward it for being right on average, I think. We could also have the AI adjust the reward based on how hard the question is for a fixed weaker AI, so it wouldn’t prefer easy questions.
You mean, so that it will generate a parametrized question which maximizes the ratio of reward to computational resources spent.
Sorry, I deleted the post without realizing you replied to it. I realized it had problems and decided to give it more thought for now.
Edited for clarity, thanks. As noted below, the AI wouldn’t have the power to expand its own computational capacity (though we could, of course, ask it what would expand its computational capacity, and what the consequences of expanding its computational capacity would be, and then modify the machine as so if we thought it was a good idea.)
Likewise, each question has its own little utility function, and the AI only cares about its singular answer to the current question. The demons don’t want to manipulate events so that future demons can give better answers, because they don’t care about future demons; they only want to answer their own defining question.
Slight worry here, if a demon has to make a prediction then it has an incentive to manipulate events to ensure its prediction comes true. E.g. a demon is asked what the probability of a nuclear war in the next decade is (suppose answers are graded by the log scoring rule). It finds a way out of the box, outputs 99.9%, then sets about ensuring its ‘prediction’ comes true (once its out of the box we can’t reliably destroy it).
Another problem is that the way it currently works it seems like all you have are the demons and a great big database, which means each demon will need at least a few days to self improve on its own before it can do any good, which allows more opportunities for shenanigans such as those above as well as attempts to expand itself, or stall for time before giving its answer to maximise probability of being correct.