Player One sets up a hypothetical AI (utility function + restraints), and Player Two, or the rest of the group, must find a situation where a super intelligence with that utility function would lead to a dystopia. Here’s my entry:
1) Utility is obtained by ‘votes’. Every person can either give +1 utility or −1 utility to the AI each day. To avoid issues of uncertainty over what constitutes a person, before the AI is turned on everyone is given a voting button. Additional voting buttons that may be created provide no utility.
2) Strongly discount future votes, using U(t) = 1 / 2^t for the new utility, where t is time in days.This way a constant vote for all time yields a finite utility.
3) The AI can not take any action until one year after it has been turned on, except for text outputs to satisfy (4)
4) The AI has an oracle component. Any voter can ask how their life on a particular future day will vary from the present day. The AI must answer to the best of its abilities.
I ask it a question as in (4), and it tells me “build this machine [specifications included] and you will get [something the AI believes it can tempt me with]”. It exerts the best of its ability in persuading me. Unknown to me, the machine that it wants me to build takes command of all the voting buttons and jams them into giving the AI +1 every day.
I should have been more clear about what I meant by an oracle. The only utility the AI takes into account when answering a question is accuracy. It must be truthful and only answers questions about differences in the days of a voter. A valid question could be something like “A year from now, if you do [horrendous act] would my voter button still vote you down?” If not, I’d vote the AI down for the whole year until it begins acting.
While I recognize the huge threat of unfriendly AI, I’m not convinced Player One can’t win. I’d like to see the game played on a wider scale (and perhaps formalized a bit) to explore the space more thoroughly. It might also help illuminate the risks of AI to people not yet convinced. Plus it’s just fun =D
This might make a fun game actually.
Player One sets up a hypothetical AI (utility function + restraints), and Player Two, or the rest of the group, must find a situation where a super intelligence with that utility function would lead to a dystopia. Here’s my entry:
1) Utility is obtained by ‘votes’. Every person can either give +1 utility or −1 utility to the AI each day. To avoid issues of uncertainty over what constitutes a person, before the AI is turned on everyone is given a voting button. Additional voting buttons that may be created provide no utility.
2) Strongly discount future votes, using U(t) = 1 / 2^t for the new utility, where t is time in days.This way a constant vote for all time yields a finite utility.
3) The AI can not take any action until one year after it has been turned on, except for text outputs to satisfy (4)
4) The AI has an oracle component. Any voter can ask how their life on a particular future day will vary from the present day. The AI must answer to the best of its abilities.
I ask it a question as in (4), and it tells me “build this machine [specifications included] and you will get [something the AI believes it can tempt me with]”. It exerts the best of its ability in persuading me. Unknown to me, the machine that it wants me to build takes command of all the voting buttons and jams them into giving the AI +1 every day.
I don’t think Player One can win this game.
I should have been more clear about what I meant by an oracle. The only utility the AI takes into account when answering a question is accuracy. It must be truthful and only answers questions about differences in the days of a voter. A valid question could be something like “A year from now, if you do [horrendous act] would my voter button still vote you down?” If not, I’d vote the AI down for the whole year until it begins acting.
While I recognize the huge threat of unfriendly AI, I’m not convinced Player One can’t win. I’d like to see the game played on a wider scale (and perhaps formalized a bit) to explore the space more thoroughly. It might also help illuminate the risks of AI to people not yet convinced. Plus it’s just fun =D