1. To give me what I want, the part of the code that gives me what I want has to be aligned with my values. It has to be FAI. Which assumes I can look at a code and tell if it’s FAI. And if I can do that, why not build the FAI myself? But sure, assuming I cannot be tricked (a *very* unrealistic assumption), it might be easier to *verify* that a section of the computer code is FAI, then to come with the whole thing from scratch.
2. The paperclipper has no incentive to actually fullfill my values. It only has incentive to look like it’s fullfilling my values. Even assuming it has given up on the “betraying me” plan, it has no reason to make the “FAI” aligned with my values, unless I can tell the difference. To the degree I cannot tell the difference, it has no reason to bother (why would it? out of some abstract spirit of niceness?) making “FAI” that is *more* aligned to my values than it has to be. It’s not just “cooperate or betray”, all of its cooperation hinges on me being able to evaluate its code and not releasing the paperclipper unless the sub-entity it is creating is perfectly aligned with my values, which requires me being able to *perfectly* evaluate the alignment of an AI system by reading its code.
3. Something about splitting gain from trade? (how much of the universe does the paperclipper get?) Commitment races? (probably not the way to go). I confess I don’t know enough decision theory to know how agents that reason correctly resolve that
4. A moot point anyway, since under the (very unrealistic) conditions described the paperclipper is never getting released. Its only role is “teaching me how to build FAI”, and the best outcome it can hope for is the amount of paperclips FAI naturally creates in the course of taking over the galaxy (which is somewhat more than a galaxy devoid of life would have). (and yes it bears repeating. the real outcome of an unaligned superintelligence is that it KILLS EVERYONE. you cannot play this kind of game against a superintelligence, and win. you cannot. do not try this at home)
1. To give me what I want, the part of the code that gives me what I want has to be aligned with my values. It has to be FAI.
Which assumes I can look at a code and tell if it’s FAI. And if I can do that, why not build the FAI myself?
But sure, assuming I cannot be tricked (a *very* unrealistic assumption), it might be easier to *verify* that a section of the computer code is FAI, then to come with the whole thing from scratch.
2. The paperclipper has no incentive to actually fullfill my values. It only has incentive to look like it’s fullfilling my values. Even assuming it has given up on the “betraying me” plan, it has no reason to make the “FAI” aligned with my values, unless I can tell the difference. To the degree I cannot tell the difference, it has no reason to bother (why would it? out of some abstract spirit of niceness?) making “FAI” that is *more* aligned to my values than it has to be. It’s not just “cooperate or betray”, all of its cooperation hinges on me being able to evaluate its code and not releasing the paperclipper unless the sub-entity it is creating is perfectly aligned with my values, which requires me being able to *perfectly* evaluate the alignment of an AI system by reading its code.
3. Something about splitting gain from trade? (how much of the universe does the paperclipper get?) Commitment races? (probably not the way to go). I confess I don’t know enough decision theory to know how agents that reason correctly resolve that
4. A moot point anyway, since under the (very unrealistic) conditions described the paperclipper is never getting released. Its only role is “teaching me how to build FAI”, and the best outcome it can hope for is the amount of paperclips FAI naturally creates in the course of taking over the galaxy (which is somewhat more than a galaxy devoid of life would have).
(and yes it bears repeating. the real outcome of an unaligned superintelligence is that it KILLS EVERYONE. you cannot play this kind of game against a superintelligence, and win. you cannot. do not try this at home)