That is, if I really take the CEV idea seriously as proposed, there simply is no way I can prefer CEV(me + X) to CEV(me)… if it turns out that I would, if I knew enough and thought about it carefully enough and “grew” enough and etc., care about other people’s preferences (either in and of themselves, as in “I hadn’t thought of that but now that you point it out I want that too”, or by reference to their owners, as in “I don’t care about that but if you do then fine let’s have that too,” for which distinction I bet there’s a philosophical term of art that I don’t know), then the CEV-extraction process will go ahead and optimize for those preferences as well, even if I don’t actually know what they are, or currently care about them; even if I currently think they are a horrible evil bad no-good idea. (I might be horrified by that result, but presumably I should endorse it anyway.)
This works precisely because the CEV-extraction process as defined depends on an enormous amount of currently-unavailable data in the course of working out the target’s “volition” given its current desires, including entirely counterfactual data about what the target would want if exposed to various idealized and underspecified learning/”growing” environments.
That said, the minute we start talking instead about some actual realizable thing in the world, some approximation of CEV-me computable by a not-yet-godlike intelligence, it stops being quite so clear that all of the above is true.
An approximate-CEV extractor might find things in your brain that I would endorse if I knew about them (given sufficient time and opportunity to discuss it with you and “grow” and so forth) but that it wasn’t able to actually compute based on just my brain as a target, in which case pointing it at both of us might be better (in my own terms!) than pointing it at just me.
It comes down to a question of how much we trust the seed AI that’s doing the extraction to actually solve the problem.
It’s also perhaps worth asking what happens if I build the CEV-extracting seed AI and point it at my target community and it comes back with “I don’t have enough capability to compute CEV for that community. I will have to increase my capabilities in order to solve that problem.”
In an idealized form, I agree with you.
That is, if I really take the CEV idea seriously as proposed, there simply is no way I can prefer CEV(me + X) to CEV(me)… if it turns out that I would, if I knew enough and thought about it carefully enough and “grew” enough and etc., care about other people’s preferences (either in and of themselves, as in “I hadn’t thought of that but now that you point it out I want that too”, or by reference to their owners, as in “I don’t care about that but if you do then fine let’s have that too,” for which distinction I bet there’s a philosophical term of art that I don’t know), then the CEV-extraction process will go ahead and optimize for those preferences as well, even if I don’t actually know what they are, or currently care about them; even if I currently think they are a horrible evil bad no-good idea. (I might be horrified by that result, but presumably I should endorse it anyway.)
This works precisely because the CEV-extraction process as defined depends on an enormous amount of currently-unavailable data in the course of working out the target’s “volition” given its current desires, including entirely counterfactual data about what the target would want if exposed to various idealized and underspecified learning/”growing” environments.
That said, the minute we start talking instead about some actual realizable thing in the world, some approximation of CEV-me computable by a not-yet-godlike intelligence, it stops being quite so clear that all of the above is true.
An approximate-CEV extractor might find things in your brain that I would endorse if I knew about them (given sufficient time and opportunity to discuss it with you and “grow” and so forth) but that it wasn’t able to actually compute based on just my brain as a target, in which case pointing it at both of us might be better (in my own terms!) than pointing it at just me.
It comes down to a question of how much we trust the seed AI that’s doing the extraction to actually solve the problem.
It’s also perhaps worth asking what happens if I build the CEV-extracting seed AI and point it at my target community and it comes back with “I don’t have enough capability to compute CEV for that community. I will have to increase my capabilities in order to solve that problem.”