the dichotomy “paper clip maximizer vs. Friendly AI” seems like a false dichotomy—I imagine that the sort of AI that people would actually build would be somewhere in the middle. Any recommended reading on this point appreciated.
Mainly Complexity of value. There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little). The falloff with imperfect reflection of values might be so sharp that any ad-hoc solution turns the future worthless. Or maybe not, with certain classes of values that contain a component of sympathy that reflects values perfectly while giving them smaller weight in the overall game, but then we’d want to technically understand this “sympathy” to have any confidence in the outcome.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little).
This depends on something like aggregative utilitarianism. If additional resources have diminishing marginal value in fulfilling human aims, that getting a little slice of the universe (in the course of negotiating terms of surrender with the inhuman AI, if it can make credible commitments, or because we serve as acausal bargaining chips with other civilizations elsewhere in the universe) may be enough. Is getting 100% of the lightcone a hundred times better than 1%?
Is getting 100% of the lightcone a hundred times better than 1%?
I think yes, if we take into account that the more of the lightcone we (our FAI) get, the more trading opportunities we would have with UFAI in other possible worlds. Diminishing marginal value shouldn’t apply across possible worlds, because otherwise it would imply gross violations of expected utility maximization.
Also, I suspect that there are possible worlds with much greater resources than our universe (perhaps with physics that allow hypercomputation, or just many orders of magnitude more total exploitable resources), and some of them would have potential trading partners who are willing to give us a small share of their world for a large share of ours. We may eventually achieve most of our value from trading with them. But of course such trade wouldn’t be possible if we didn’t have something to trade with!
Interesting. This suggests thinking about FAI not as using its control to produce terminal value in its own world, but as using its control to buy as much terminal value as it can, in various world-programs. Since it doesn’t matter where the value is produced, most of the value doesn’t have to be produced in the possible worlds with FAIs in them. Indeed, it sounds unlikely that specifically the FAI worlds will be optimal for FAI-value optimization. FAIs (and the worlds they control) act as instrumental leverage, a way of controlling the global mathematical universe into having more value for our preference.
Thus, more FAIs means stronger control over the mathematical universe, while more UFAIs mean that the mathematical universe is richer, and so the FAIs can get more value out of it with the same control. The metaphors of trade and comparative advantage start applying again, not on the naive level of cohabitation on the same world, but on the level of the global ontology. Mathematics grants you total control over your domain, so that your “atoms” can’t be reused for something else by another stronger agent, and so you do benefit from most superintelligent “aliens”.
Yes, assuming that trading across possible worlds can be done in the first place. One thing that concerns me is the combinatorial explosion of potential trading partners. How do they manage to “find” each other?
It’s the same combinatorial explosion as with the future possible worlds. Even though you can’t locate individual valuable future outcomes (through certain instrumental sequences of exact events), you can still make decisions about your actions leading to certain consequences “in bulk”, and I expect the trade between possible worlds can be described similarly (after all, it does work on exactly the same decision-making algorithm). Thus, you usually won’t know who are you trading with, exactly, but on the net estimate that your actions are in the right direction.
I currently agree it’s a bad analogy and I no longer endorse the position that global acausal trade is probably feasible, although its theoretical possibility seems to be a stable conclusion.
There are two distinct issues here: (1) how high would a human with original preference value a universe which only gives a small weight to their preference, and (2) how likely is the changed preference to give any weight whatsoever to the original preference, in other words to produce a universe to any extent valuable to the original preference, even if original preference values universes only weakly optimized in its direction.
Moving to a different preference is different from lowering weight of the original preference. A slightly changed (formal definition of) preference may put no weight at all on the preceding preference. The optimal outcome according to the modified preference can thus be essentially moral noise, paperclips, to the original preference. Giving a small slice of the universe, on the other hand, is what you get out of aggregation of preference, and a changed preference doesn’t necessarily have a form of aggregation that includes original preference. (On the other hand, there is a hope that human-like preferences include sympathy, which does make them aggregate preferences of other persons with some weight.)
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc).
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
You only get your slice of the world because you already control it.
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
I assume that control will be winner-takes-all,
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
Game-theoretic considerations are only relevant when you have non-trivial control,
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers.
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
But if the first difference is much bigger than the second, the pattern is the reverse.
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.
I’m not convinced by the claim that human values have high Kolmogorov complexity.
In particular, Eliezer’s article Not for the Sake of Happiness Alone is totally at odds with my own beliefs. In my mind, it’s incoherent to give anything other than subjective experiences ethical consideration. My own preference for real science over imagined science is entirely instrumental and not at all terminal.
Now, maybe Eliezer is confused about what his terminal values are, or maybe I’m confused about what my terminal values are, or maybe our terminal values are incompatible. In any case, it’s not obvious that an AI should care about anything other than the subjective experiences of sentient beings.
Suppose that it’s okay for an AI to exclude everything but subjective experience from ethical consideration. Is there then still reason to expect that human values have high Kolmogorov complexity?
I don’t have a low complexity description to offer, but it seems to me that one can get a lot of mileage out of the principles “if an individual prefers state A to state B whenever he/she/it is in either of state A or state B, then state A is superior for that individual to state B” and “when faced with two alternatives, the moral alternative is the one that you would prefer if you were going to live through the lives of all sentient beings involved.”
Of course “sentient being” is ill-defined and one would have to do a fair amount of work frame the things that I just said in more formal terms, but anyway, it’s not clear to me that there’s a really serious problem here.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little).
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Have you read the Heaven post by denisbider and the twofollow-ups constituting a mini-wireheading series? There have been other posts on the difference between wanting and liking; but it illustrates a fairly strong problem with wireheading: Even if all we’re worried about is “subjective states,” many people won’t want to be put in that subjective state, even knowing they’ll like it. Forcing them into it or changing their value system so they do want it are ethically suboptimal solutions.
So, it seems to me that if anything other than maximized absolute wireheading for everyone is the AI’s goal, it’s gonna start to get complicated.
Thanks for the references to the posts which I had not seen before and which I find relevant. I’m sympathetic toward denisbider’s view, but will read the comments to see if I find diverging views compelling.
But I would qualify the last sentence of my reply by saying that the best way to get a superhuman AI to be as friendly as possible may not be to work on friendly AI or advocate for friendly AI. For example, it may be best to work toward geopolitical stability to minimize the chances of some country rashly creating a potentially unsafe AI out of a sense of desperation during wartime.
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Yes, I was agreeing with what I inferred your attitude to be rather than agreeing with something that you said. (I apologize if I distorted your views—if you’d like I can edit my comment to remove the suggestion that you hold the position that I attributed to you.)
I don’t believe that we “should focus all of our resources” on FAI, as there are many other worthy activities to focus on. The argument is that this particular problem gets disproportionally little attention, and while with other risks we can in principle luck out even if they get no attention, it isn’t so for AI. Failing to take FAI seriously is fatal, failing to take nanotech seriously isn’t necessarily fatal.
Thus, although strictly speaking I agree with your implication, I don’t see its condition plausible, and so implication as whole relevant.
Re: “Is there then still reason to expect that human values have high Kolmogorov complexity?”
Human values are mosly a product of their genes and their memes. There is an awful lot of information in those. However, it is true that you can fairly closely approximate human values—or those of any other creature—by the directive to make as many grandchildren as possible—which seems reasonably simple.
Most of the arguments for humans having complex values appear to list a whole bunch of proximate goals—as though that constitutes evidence.
I disagree. You need to know much more than just the drive for grandchildren, given the massively diverse ways we observe even in our present world for species to propagate, all of which correspond to different articulable values once they reach human intelligence.
Human values should be expected to have a high K-complexity because you would need to specify both the genes/early environment, and the precise place in history/Everett branches where humans are now.
The idea was to “approximate human values”—not to express them in precise detail: nobody cares much if Jim likes strawberry jam more than he likes raspberry jam.
The idea was to “approximate human values”—not to express them in precise detail
Sure, but I take “approximation” to mean something like getting you within 10 or so bits of the true distribution, but the heuristic you gave still leaves you maybe 500 or so bits away, which is huge, and far more than you implied.
The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation
That would help you on message length if you had already stored one person’s values and were looking to store a second person’s. It does not for describing the first person’s value, or some aggregate measure of humans’ values.
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
10 bits short of the needed message, not a 10-bit message. I mean that e.g. an approximation gives 100 bits when full accuracy would be 110 bits (and 10 bits is an upper bound).
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.
Re: “That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.”
To specify the environment, choose the universe, galaxy, star, planet, lattiude, longitude and time. I am not pretending that information is simple, just that it is already there, if your project is building an intelligent agent.
Yes, I got that the first time. I don’t think you are appreciating the difficulty of coding even relatively simple utility functions. A couple of ASCII characters is practically nothing!
ASCII characters aren’t a relevant metric here. Getting within 10 bits of the correct answer means that you’ve narrowed it down to 2^10 = 1024 distinct equiprobable possibilities [1], one of which is correct. Sounds like an approximation to me! (if a bit on the lower end of the accuracy expected out of one)
[1] or probability distribution with the same KL divergence from the true governing distribution
Mainly Complexity of value. There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little). The falloff with imperfect reflection of values might be so sharp that any ad-hoc solution turns the future worthless. Or maybe not, with certain classes of values that contain a component of sympathy that reflects values perfectly while giving them smaller weight in the overall game, but then we’d want to technically understand this “sympathy” to have any confidence in the outcome.
This depends on something like aggregative utilitarianism. If additional resources have diminishing marginal value in fulfilling human aims, that getting a little slice of the universe (in the course of negotiating terms of surrender with the inhuman AI, if it can make credible commitments, or because we serve as acausal bargaining chips with other civilizations elsewhere in the universe) may be enough. Is getting 100% of the lightcone a hundred times better than 1%?
I think yes, if we take into account that the more of the lightcone we (our FAI) get, the more trading opportunities we would have with UFAI in other possible worlds. Diminishing marginal value shouldn’t apply across possible worlds, because otherwise it would imply gross violations of expected utility maximization.
Also, I suspect that there are possible worlds with much greater resources than our universe (perhaps with physics that allow hypercomputation, or just many orders of magnitude more total exploitable resources), and some of them would have potential trading partners who are willing to give us a small share of their world for a large share of ours. We may eventually achieve most of our value from trading with them. But of course such trade wouldn’t be possible if we didn’t have something to trade with!
Interesting. This suggests thinking about FAI not as using its control to produce terminal value in its own world, but as using its control to buy as much terminal value as it can, in various world-programs. Since it doesn’t matter where the value is produced, most of the value doesn’t have to be produced in the possible worlds with FAIs in them. Indeed, it sounds unlikely that specifically the FAI worlds will be optimal for FAI-value optimization. FAIs (and the worlds they control) act as instrumental leverage, a way of controlling the global mathematical universe into having more value for our preference.
Thus, more FAIs means stronger control over the mathematical universe, while more UFAIs mean that the mathematical universe is richer, and so the FAIs can get more value out of it with the same control. The metaphors of trade and comparative advantage start applying again, not on the naive level of cohabitation on the same world, but on the level of the global ontology. Mathematics grants you total control over your domain, so that your “atoms” can’t be reused for something else by another stronger agent, and so you do benefit from most superintelligent “aliens”.
Yes, assuming that trading across possible worlds can be done in the first place. One thing that concerns me is the combinatorial explosion of potential trading partners. How do they manage to “find” each other?
It’s the same combinatorial explosion as with the future possible worlds. Even though you can’t locate individual valuable future outcomes (through certain instrumental sequences of exact events), you can still make decisions about your actions leading to certain consequences “in bulk”, and I expect the trade between possible worlds can be described similarly (after all, it does work on exactly the same decision-making algorithm). Thus, you usually won’t know who are you trading with, exactly, but on the net estimate that your actions are in the right direction.
Isn’t the set of future worlds with high measure a lot smaller?
I currently agree it’s a bad analogy and I no longer endorse the position that global acausal trade is probably feasible, although its theoretical possibility seems to be a stable conclusion.
Robin Hanson would be so pleased that it turns out economics is the fundamental law of the entire ensemble universe.
There are two distinct issues here: (1) how high would a human with original preference value a universe which only gives a small weight to their preference, and (2) how likely is the changed preference to give any weight whatsoever to the original preference, in other words to produce a universe to any extent valuable to the original preference, even if original preference values universes only weakly optimized in its direction.
Moving to a different preference is different from lowering weight of the original preference. A slightly changed (formal definition of) preference may put no weight at all on the preceding preference. The optimal outcome according to the modified preference can thus be essentially moral noise, paperclips, to the original preference. Giving a small slice of the universe, on the other hand, is what you get out of aggregation of preference, and a changed preference doesn’t necessarily have a form of aggregation that includes original preference. (On the other hand, there is a hope that human-like preferences include sympathy, which does make them aggregate preferences of other persons with some weight.)
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
Yep, these are key considerations.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
What do “small control” and “large control” mean?
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
Worrying less about our individual (or national) shares, and being more cooperative with other humans or uploads seems like an important upshot.
I’m not convinced by the claim that human values have high Kolmogorov complexity.
In particular, Eliezer’s article Not for the Sake of Happiness Alone is totally at odds with my own beliefs. In my mind, it’s incoherent to give anything other than subjective experiences ethical consideration. My own preference for real science over imagined science is entirely instrumental and not at all terminal.
Now, maybe Eliezer is confused about what his terminal values are, or maybe I’m confused about what my terminal values are, or maybe our terminal values are incompatible. In any case, it’s not obvious that an AI should care about anything other than the subjective experiences of sentient beings.
Suppose that it’s okay for an AI to exclude everything but subjective experience from ethical consideration. Is there then still reason to expect that human values have high Kolmogorov complexity?
I don’t have a low complexity description to offer, but it seems to me that one can get a lot of mileage out of the principles “if an individual prefers state A to state B whenever he/she/it is in either of state A or state B, then state A is superior for that individual to state B” and “when faced with two alternatives, the moral alternative is the one that you would prefer if you were going to live through the lives of all sentient beings involved.”
Of course “sentient being” is ill-defined and one would have to do a fair amount of work frame the things that I just said in more formal terms, but anyway, it’s not clear to me that there’s a really serious problem here.
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Have you read the Heaven post by denisbider and the two follow-ups constituting a mini-wireheading series? There have been other posts on the difference between wanting and liking; but it illustrates a fairly strong problem with wireheading: Even if all we’re worried about is “subjective states,” many people won’t want to be put in that subjective state, even knowing they’ll like it. Forcing them into it or changing their value system so they do want it are ethically suboptimal solutions.
So, it seems to me that if anything other than maximized absolute wireheading for everyone is the AI’s goal, it’s gonna start to get complicated.
Thanks for the references to the posts which I had not seen before and which I find relevant. I’m sympathetic toward denisbider’s view, but will read the comments to see if I find diverging views compelling.
Maybe you should start with what’s linked from fake fake utility functions then (the page on the wiki wasn’t organized quite as I expected).
But I would qualify the last sentence of my reply by saying that the best way to get a superhuman AI to be as friendly as possible may not be to work on friendly AI or advocate for friendly AI. For example, it may be best to work toward geopolitical stability to minimize the chances of some country rashly creating a potentially unsafe AI out of a sense of desperation during wartime.
(?) I never said that.
Yes, I was agreeing with what I inferred your attitude to be rather than agreeing with something that you said. (I apologize if I distorted your views—if you’d like I can edit my comment to remove the suggestion that you hold the position that I attributed to you.)
I don’t believe that we “should focus all of our resources” on FAI, as there are many other worthy activities to focus on. The argument is that this particular problem gets disproportionally little attention, and while with other risks we can in principle luck out even if they get no attention, it isn’t so for AI. Failing to take FAI seriously is fatal, failing to take nanotech seriously isn’t necessarily fatal.
Thus, although strictly speaking I agree with your implication, I don’t see its condition plausible, and so implication as whole relevant.
Re: “Is there then still reason to expect that human values have high Kolmogorov complexity?”
Human values are mosly a product of their genes and their memes. There is an awful lot of information in those. However, it is true that you can fairly closely approximate human values—or those of any other creature—by the directive to make as many grandchildren as possible—which seems reasonably simple.
Most of the arguments for humans having complex values appear to list a whole bunch of proximate goals—as though that constitutes evidence.
I disagree. You need to know much more than just the drive for grandchildren, given the massively diverse ways we observe even in our present world for species to propagate, all of which correspond to different articulable values once they reach human intelligence.
Human values should be expected to have a high K-complexity because you would need to specify both the genes/early environment, and the precise place in history/Everett branches where humans are now.
The idea was to “approximate human values”—not to express them in precise detail: nobody cares much if Jim likes strawberry jam more than he likes raspberry jam.
The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation: http://en.wikipedia.org/wiki/Canalisation_%28genetics%29
Sure, but I take “approximation” to mean something like getting you within 10 or so bits of the true distribution, but the heuristic you gave still leaves you maybe 500 or so bits away, which is huge, and far more than you implied.
That would help you on message length if you had already stored one person’s values and were looking to store a second person’s. It does not for describing the first person’s value, or some aggregate measure of humans’ values.
10 bits!!! That’s not much of a message!
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
10 bits short of the needed message, not a 10-bit message. I mean that e.g. an approximation gives 100 bits when full accuracy would be 110 bits (and 10 bits is an upper bound).
That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.
Re: “That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.”
To specify the environment, choose the universe, galaxy, star, planet, lattiude, longitude and time. I am not pretending that information is simple, just that it is already there, if your project is building an intelligent agent.
Re: “10 bits short of the needed message”.
Yes, I got that the first time. I don’t think you are appreciating the difficulty of coding even relatively simple utility functions. A couple of ASCII characters is practically nothing!
ASCII characters aren’t a relevant metric here. Getting within 10 bits of the correct answer means that you’ve narrowed it down to 2^10 = 1024 distinct equiprobable possibilities [1], one of which is correct. Sounds like an approximation to me! (if a bit on the lower end of the accuracy expected out of one)
[1] or probability distribution with the same KL divergence from the true governing distribution
Or you can implement constant K-complexity learn-by-example algorithm and get all the rest from environment.
How about “Do as your creators do (generalize this as your creators generalize)”?
Maybe you should start with what’s linked from fake fake utility functions then (the page on the wiki wasn’t organized quite as I expected).