I’ve thought of a bit of intuition here, maybe someone will benefit by it or be kind enough to critique it;
Say you took two (sufficiently identical) copies of that person C1 and C2, and exposed C1 to the wirehead situation (by plugging them in) and showed C2 what was happening to C1.
It seems likely that C1 would want to remain in the situation and C2 would want to remove C1 from the wirehead device. This seems to be the case even if the wirehead machine doesn’t raise dopamine levels very much and thus the user does not become dependent on it.
However, even in the pleasure maximizing scenario you have a range of possible futures; any future where abolitionism is carried out is viable. Of course, a pure pleasure maximizer would probably force the most resource efficient way to induce pleasure for the largest number of people for the longest amount of time, so wireheading is a likely outcome.
That said, it seems like some careful balancing of the desires of C1 and C2 would allow for a mutually acceptable outcome that they would both move to, an abolitionist future with plenty of novelty (a fun place to live, in other words).
This is sort of what I currently envision the CEV would do, run a simulated partial-individual, C1, on a broad range of possible futures and detect whether distance is occurring by testing against a second copy of the initial state of C1, C2. Distance would be present if C2 is put off by or does not comprehend the desires of C1 even though C1 prefers its current state to most other possible states. To resolve the distance issue, CEV would then select the set of highest utility futures that C1 is run on and chooses the one that appears most preferable to C2 (which represents C1′s initial state). Assuming you have a metric for distance, CEV could then reject or accept this contingent upon whether distance falls under a certain threshold.
Sensible, maybe, but pointless in my opinion. Once you have C1′s approval, then any additional improvements (wouldn’t C2 like to see what C3 would be like?) would be from C2′s perspective, which naturally would be different from C1′s perspective, and turtles all the way down. So it would be deceptive to C1 to present him with C2′s results, if any incremental happiness were still possible, C2 would naturally harbor the same wish for improvement which caused C1 to accept it. All it would be doing would be shielding C1′s virgin imagination from C5822.
I’m not sure you’re talking about the same thing I am, or maybe I’m just not following you?
There is only C1 and C2. C2 serves as a grounding that checks to see if what it would pick given the experiences it went through is acceptable to C1′s initial state. C1 would not have the “virgin imagination”, it would be the one hooked up to the wirehead machine.
Really I was thinking about the “Last Judge” idea from the CEV, which (as I understand it, but it is super vague so maybe I don’t) basically somehow has someone peek at the solution given by the CEV and decide whether the outcome is acceptable from the outside.
Aside from my accidental swapping of the terms (C1 as the judge, not C2), I still stand by my (unclear, possibly?) opinion. In the situation you are describing, the “judge” would never allow the agent to change beyond a very small distance that the judge is comfortable with, and additional checks would never be necessary, as it would only be logical that the judge’s opinion would be the same every time that an improvement was considered. Whichever of the states that the judge finds acceptable the first time, should become the new base state for the judge. Similarly, in real life, you don’t hold your preferences to the same standards that you had when you were five years old. The gradual improvements in cognition usually justify the risks of updating one’s values, in my opinion.
I’ve thought of a bit of intuition here, maybe someone will benefit by it or be kind enough to critique it;
Say you took two (sufficiently identical) copies of that person C1 and C2, and exposed C1 to the wirehead situation (by plugging them in) and showed C2 what was happening to C1.
It seems likely that C1 would want to remain in the situation and C2 would want to remove C1 from the wirehead device. This seems to be the case even if the wirehead machine doesn’t raise dopamine levels very much and thus the user does not become dependent on it.
However, even in the pleasure maximizing scenario you have a range of possible futures; any future where abolitionism is carried out is viable. Of course, a pure pleasure maximizer would probably force the most resource efficient way to induce pleasure for the largest number of people for the longest amount of time, so wireheading is a likely outcome.
That said, it seems like some careful balancing of the desires of C1 and C2 would allow for a mutually acceptable outcome that they would both move to, an abolitionist future with plenty of novelty (a fun place to live, in other words).
This is sort of what I currently envision the CEV would do, run a simulated partial-individual, C1, on a broad range of possible futures and detect whether distance is occurring by testing against a second copy of the initial state of C1, C2. Distance would be present if C2 is put off by or does not comprehend the desires of C1 even though C1 prefers its current state to most other possible states. To resolve the distance issue, CEV would then select the set of highest utility futures that C1 is run on and chooses the one that appears most preferable to C2 (which represents C1′s initial state). Assuming you have a metric for distance, CEV could then reject or accept this contingent upon whether distance falls under a certain threshold.
Does this seem sensible?
Sensible, maybe, but pointless in my opinion. Once you have C1′s approval, then any additional improvements (wouldn’t C2 like to see what C3 would be like?) would be from C2′s perspective, which naturally would be different from C1′s perspective, and turtles all the way down. So it would be deceptive to C1 to present him with C2′s results, if any incremental happiness were still possible, C2 would naturally harbor the same wish for improvement which caused C1 to accept it. All it would be doing would be shielding C1′s virgin imagination from C5822.
I’m not sure you’re talking about the same thing I am, or maybe I’m just not following you? There is only C1 and C2. C2 serves as a grounding that checks to see if what it would pick given the experiences it went through is acceptable to C1′s initial state. C1 would not have the “virgin imagination”, it would be the one hooked up to the wirehead machine.
Really I was thinking about the “Last Judge” idea from the CEV, which (as I understand it, but it is super vague so maybe I don’t) basically somehow has someone peek at the solution given by the CEV and decide whether the outcome is acceptable from the outside.
Aside from my accidental swapping of the terms (C1 as the judge, not C2), I still stand by my (unclear, possibly?) opinion. In the situation you are describing, the “judge” would never allow the agent to change beyond a very small distance that the judge is comfortable with, and additional checks would never be necessary, as it would only be logical that the judge’s opinion would be the same every time that an improvement was considered. Whichever of the states that the judge finds acceptable the first time, should become the new base state for the judge. Similarly, in real life, you don’t hold your preferences to the same standards that you had when you were five years old. The gradual improvements in cognition usually justify the risks of updating one’s values, in my opinion.