A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process).
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.