A recursively self improving PhilGoetz with that sort of power and growth rate will be an unfriendly singularity.
How do you infer that? Also, is CEV any better? I will be justly insulted if you prefer the average of all human utility functions to the PhilGoetz utility function.
You’re familiar with CEV so I’ll try to reply with the concepts from Eliezer’s CEV document.
Defining Friendliness is not the life-or-death problem on which the survival of humanity depends. It is a life-or-death problem, but not the life-or-death problem. Friendly AI requires:
Solving the technical problems required to maintain a well-specified abstract invariant in a self-modifying goal system. (Interestingly, this problem is relatively straightforward from a theoretical standpoint.)
Choosing something nice to do with the AI. This is about midway in theoretical hairiness between problems 1 and 3.
Designing a framework for an abstract invariant that doesn’t automatically wipe out the human species. This is the hard part.
PhilGoetz does not have a framework for a well specified abstract invariant self-modifying goal system. If Phil was “seeming to be growing too powerful too quickly” then quite likely the same old human problems are occurring and a whole lot more besides.
The problem isn’t with your values, CEV, the problem is that you aren’t a safe system for producing a recursively self improving singularity. Humans don’t even keep the same values when you give them power let alone when they are hacking their brains into unknown territory.
I use ‘extrapolated volition’ when talking about the outcome of the process upon an individual. “Coherent Extrapolated Volition” would be correct but redundant. When speaking of instantiations of CEV with various parameters (of individuals, species or groups) it is practical, technically correct and preferred to write CEV regardless of the count of individuals in the parameter. Partly because it should be clear that CEV and CEV are talking about things very similar in kind. Partly because if people see “CEV” and google it they’ll find out what it means. Mostly because the ‘EV’ acronym is overloaded within the nearby namespace.
AVERAGE(3.1415) works in google docs. It returns 2.1415. If you are comparing a whole heap of aggregations of a feature, some of which only have one value, it is simpler to just use the same formula.
Is that Phil Goetz’s CEV vs. all humans’ CEV, or Phil Goetz’s current preferences or behaviour-function vs. the average of all humans’ current preferences or behaviour-functions? In the former scenario, I’d prefer the global CEV (if I were confident that it would work as stated), but in the latter, even without me remembering much about Phil and his views other than that he appears to be an intelligent educated Westerner who can be expected to be fairly reliably careful about potentially world-changing actions, I’d probably feel safer with him as world dictator than with a worldwide direct democracy automatically polling everyone in the world on what to do, considering the kinds of humans who currently make up a large majority of the population.
Is that Phil Goetz’s CEV vs. all humans’ CEV, or Phil Goetz’s current preferences or behaviour-function vs. the average of all humans’ current preferences or behaviour-functions?
“I am obliged to confess I should sooner live in a society governed by the first two thousand names in the Boston telephone directory than in a society governed by the two thousand faculty members of Harvard University. ”—William F. Buckley, Jr.
Sorry Phil but now we’ve got a theoretical fight on our hands between my transhuman value set and yours. Not good for the rest of humanity. I’d rather our rulers had values that benefited everybody on average and not skewed towards your value set (or mine) at the expense of everybody else.
Ah. That’s not at all the understanding of “utility” I’ve seen used elsewhere on this site, so I appreciate the clarification, if not its tone.
So, OK. Given that understanding, “I’d prefer the average of all human utility function over my maximized utility function even if it means I have less utility.” means that xxd would prefer (on average, everyone’s needs are met) over (xxd’s needs are maximally met). And you’re asking whether I’d prefer that xxd’s preferences be implemented, or those of “an AI who is seeking resources to further it’s own goals at the expense of everyone else” which you’re calling “Transhuman AI PhilGoetz”… yes? (I will abbreviate that T-AI-PG herafter)
The honest answer is I can’t make that determination until I have some idea what having everyone’s needs met actually looks like, and some idea of what T-AI-PGs goals look like. If T-AI-PGs goals happen to include making life awesomely wonderful for me and everyone I care about, and xxd’s understanding of “everyone’s needs” leaves me and everyone I care about worse off than that, then I’d prefer that T-AI-PG’s preferences be implemented.
That said, I suspect that you’re taking it for granted that T-AI-PGs goals don’t include that, and also that xxd’s understanding of “everyone’s needs” really and truly makes everything best for everyone, and probably consider it churlish and sophist of me to imply otherwise.
So, OK: sure, if I make those further assumptions, I’d much rather have xxd’s preferences implemented than T-AI-PG’s preferences. Of course.
Your version is exactly the same as Phil’s, just that you’ve enlarged it to include yourself and everyone you care about’s utility being maximized rather than humanity as a whole having it’s utility maximized.
When we actually do get an FAI (if) it is going to be very interesting to see how it resolves given that even among those who are thinking about this ahead of time we can’t even agree on the goals defining what FAI should actually shoot for.
I do not understand what your first sentence means.
As for your second sentence: stating what it is we value, even as individuals (let alone collectively), in a sufficiently clear and operationalizable form that it could actually be implemented, in a sufficiently consistent form that we would want it implemented, is an extremely difficult problem. I have yet to see anyone come close to solving it; in my experience the world divides neatly into people who don’t think about it at all, people who think they’ve solved it and are wrong, and people who know they haven’t solved it.
If some entity (an FAI or whatever) somehow successfully implemented a collective solution it would be far more than interesting, it would fundamentally and irrevocably change the world.
I infer from my reading of your tone that you disagree with me here; the impression I get is that you consider the fact that we haven’t agreed on a solution to demonstrate our inadequacies as problem solvers, even by human standards, but that you’re too polite to say so explicitly. Am I wrong?
We actually agree on the difficulty of the problem. I think it’s very difficult to state what it is that we want AND that if we did so we’d find that individual utility functions contradict each other.
Moreover, I’m saying that maximizing Phil Goetz’s utility function or yours and everybody you love (or even my own selfish desires and wants plus those of everyone I love) COULD in effect be an unfriendly AI because MANY others would have theirs minimized.
So I’m saying that I think a friendly AI has to have it’s goals defined as:
Choice A. the maximum number of people have their utility functions improved (rather than maximized) even if some minimized number of people have their utility functions worsened
as opposed to
Choice B. a small number having their utility functions maximized as opposed to a large number of people having their utility functions decreased (or zeroed out).
As a side note: I find it amusing that it’s so difficult to even understand each others basic axioms never mind agree on the details of what maximizing the utility function for all of us as a whole means.
To be clear: I don’t know what the details are of maximizing the utility function for all of humanity. I just think that a fair maximization of the utility function for everyone has an interesting corrollary:
In order to maximize the function for everyone, some will have their individual utility functions decreased unless we accept a much narrower definition of friendly meaning “friendly to me”
in which case as far as I’m concerned that no longer means friendly.
The logical tautology here is of course that those who consider “friendly to me” as being the only possible definition of friendly would consider an AI that maximized the average utility function of humanity and they themselves lost out, to be an UNfriendly AI.
If you want to facilitate communication, I recommend that you stop using the word “friendly” in this context on this site. There’s a lot of talk on this site of “Friendly AI”, by which is meant something relatively specific. You are using “friendly” in the more general sense implied by the English word. This is likely to cause rather a lot of confusion.
You’re right that if strategy 1 optimizes for good stuff happening to everyone I care and strategy 2 optimizes for good stuff happening to everyone whether I care about them or not, then strategy 1 will (if done sufficiently powerfully) result in people I don’t care about having good stuff taken away from them, and strategy 2 will result in everyone I care about getting less good stuff than strategy 1 will.
You seem to be saying that I therefore ought to prefer that strategy 2 be implemented, rather than strategy 1. Is that right?
You seem to be saying that you yourself prefer that strategy 2 be implemented, rather than strategy 1. Is that right?
Nope, I’m saying strategy 2 is better for humanity. Of course personally I’d prefer strategy 1 but I’m honest enough with myself to know that certain individuals would find their utility functions severely degraded if I had an all powerful AI working for me and if I don’t trust myself to be in charge then I don’t trust any other human unless it’s someone like Ghandi.
It’s not as clear as you think it is. I’m not familiar with any common definition of “utility” that unambiguously means “the satisfaction of needs”, nor was I able to locate one in a dictionary.
“Utility” is used hereabouts as a numerical value assigned to outcomes such that outcomes with higher utilities are always preferred to outcomes with lower utilities. See Wiki:Utility function.
Nor am I familiar with “sophist” used as an adjective.
Utility is generally meant to be “economic utility” in most discussions I take part in notwithstanding the definition you’re espousing for hereabouts.
I believe that the definition of utility you’re giving is far too open and could all too easily lead to smiley world.
It is very common to use nouns as adjectives where no distinct adjective already exists and thus saying someone is “sophist” is perfectly acceptable English usage.
(economics) The ability of a commodity to satisfy needs or wants; the satisfaction experienced by the consumer of that commodity.
It ambiguously allows both ‘needs’ and ‘wants’, as well as ambiguous ‘satisfaction experienced’.
The only consistent, formal definition of utility I’ve seen used in economics (or game theory) is the one I gave above. If it was clear someone was not using that definition, I might assume they were using it as more generic “preference satisfaction”, or John Stuart Mill’s difficult-to-formalize-coherently “pleasure minus pain”, or the colloquial vague “usefulness” (whence “utilitarian” is colloquially a synonym for “pragmatic”).
Do you have a source defining utility clearly and unambiguously as “the satisfaction of needs”?
No you’re right it doesn’t nail it down precisely (the satisfaction of needs or wants).
I do believe, however, that it more precisely nails it down than the wiki on here.
Or on second thoughts maybe not because we again come back to conflicting utilities: a suicidal might value being killed as higher utility than someone who is sitting on death row and doesn’t want to die.
And I was using the term utility from economics since it’s the only place I’ve heard where they use “utility function” so I naturally assumed that’s what you were talking about since even if we disagree around the edges the meanings still fit the context for the purposes of this discussion.
Phil: an AI who is seeking resources to further it’s own goals at the expense of everyone else is by definition an unfriendly AI.
The question is whether the PhilGoetz utility function, or the average human utility function, are better. Assume both are implemented in AIs of equal power. What makes the average human utility function “friendlier”? It would have you outlaw homosexuality and sex before marriage, remove all environmental protection laws, make child abuse and wife abuse legal, take away legal rights from women, give wedgies to smart people, etc.
Now consider this: I’d prefer the average of all human utility function over my maximized utility function even if it means I have less utility.
“The question is whether the PhilGoetz utility function, or the average human utility function, are better. ”
That is indeed the question. But I think you’ve framed and stacked the the deck here with your description of what you believe the average human utility function is in order to attempt to take the moral high ground rather than arguing against my point which is this:
How do you maximize the preferred utility function for everyone instead of just a small group?
How do you infer that? Also, is CEV any better? I will be justly insulted if you prefer the average of all human utility functions to the PhilGoetz utility function.
You’re familiar with CEV so I’ll try to reply with the concepts from Eliezer’s CEV document.
PhilGoetz does not have a framework for a well specified abstract invariant self-modifying goal system. If Phil was “seeming to be growing too powerful too quickly” then quite likely the same old human problems are occurring and a whole lot more besides.
The problem isn’t with your values, CEV, the problem is that you aren’t a safe system for producing a recursively self improving singularity. Humans don’t even keep the same values when you give them power let alone when they are hacking their brains into unknown territory.
When talking about one individual, there is no C in CEV.
I use ‘extrapolated volition’ when talking about the outcome of the process upon an individual. “Coherent Extrapolated Volition” would be correct but redundant. When speaking of instantiations of CEV with various parameters (of individuals, species or groups) it is practical, technically correct and preferred to write CEV regardless of the count of individuals in the parameter. Partly because it should be clear that CEV and CEV are talking about things very similar in kind. Partly because if people see “CEV” and google it they’ll find out what it means. Mostly because the ‘EV’ acronym is overloaded within the nearby namespace.
AVERAGE(3.1415) works in google docs. It returns 2.1415. If you are comparing a whole heap of aggregations of a feature, some of which only have one value, it is simpler to just use the same formula.
Seems reasonable.
I think I’d prefer the average of all human utility functions to any one individual’s utility function; don’t take it personally.
Is that Phil Goetz’s CEV vs. all humans’ CEV, or Phil Goetz’s current preferences or behaviour-function vs. the average of all humans’ current preferences or behaviour-functions? In the former scenario, I’d prefer the global CEV (if I were confident that it would work as stated), but in the latter, even without me remembering much about Phil and his views other than that he appears to be an intelligent educated Westerner who can be expected to be fairly reliably careful about potentially world-changing actions, I’d probably feel safer with him as world dictator than with a worldwide direct democracy automatically polling everyone in the world on what to do, considering the kinds of humans who currently make up a large majority of the population.
Voted up for distinguishing these things.
“I am obliged to confess I should sooner live in a society governed by the first two thousand names in the Boston telephone directory than in a society governed by the two thousand faculty members of Harvard University. ”—William F. Buckley, Jr.
Yes, I agree that William F. Buckley, Jr. probably disagrees with me.
Heh. I figured you’d heard the quote: I just thought of it when I read your comment.
I agree with Buckley, mainly because averaging would smooth out our evolved unconscious desire to take power for ourselves when we become leaders.
I can’t agree with that. I’ve got a personal bias against people with surnames starting ’A”!
Sorry Phil but now we’ve got a theoretical fight on our hands between my transhuman value set and yours. Not good for the rest of humanity. I’d rather our rulers had values that benefited everybody on average and not skewed towards your value set (or mine) at the expense of everybody else.
Phil: an AI who is seeking resources to further it’s own goals at the expense of everyone else is by definition an unfriendly AI.
Transhuman AI PhilGoetz is such a being.
Now consider this: I’d prefer the average of all human utility function over my maximized utility function even if it means I have less utility.
I dont want humanity to die and I am prepared to die myself to prevent it from happening.
Which of the two utility functions would most of humanity prefer hmmmmm?
If you would prefer A over B, it’s very unclear to me what it means to say that giving you A instead of B reduces your utility.
It’s not unclear at all. Utility is the satisfaction of needs.
Ah. That’s not at all the understanding of “utility” I’ve seen used elsewhere on this site, so I appreciate the clarification, if not its tone.
So, OK. Given that understanding, “I’d prefer the average of all human utility function over my maximized utility function even if it means I have less utility.” means that xxd would prefer (on average, everyone’s needs are met) over (xxd’s needs are maximally met). And you’re asking whether I’d prefer that xxd’s preferences be implemented, or those of “an AI who is seeking resources to further it’s own goals at the expense of everyone else” which you’re calling “Transhuman AI PhilGoetz”… yes? (I will abbreviate that T-AI-PG herafter)
The honest answer is I can’t make that determination until I have some idea what having everyone’s needs met actually looks like, and some idea of what T-AI-PGs goals look like. If T-AI-PGs goals happen to include making life awesomely wonderful for me and everyone I care about, and xxd’s understanding of “everyone’s needs” leaves me and everyone I care about worse off than that, then I’d prefer that T-AI-PG’s preferences be implemented.
That said, I suspect that you’re taking it for granted that T-AI-PGs goals don’t include that, and also that xxd’s understanding of “everyone’s needs” really and truly makes everything best for everyone, and probably consider it churlish and sophist of me to imply otherwise.
So, OK: sure, if I make those further assumptions, I’d much rather have xxd’s preferences implemented than T-AI-PG’s preferences. Of course.
Your version is exactly the same as Phil’s, just that you’ve enlarged it to include yourself and everyone you care about’s utility being maximized rather than humanity as a whole having it’s utility maximized.
When we actually do get an FAI (if) it is going to be very interesting to see how it resolves given that even among those who are thinking about this ahead of time we can’t even agree on the goals defining what FAI should actually shoot for.
I do not understand what your first sentence means.
As for your second sentence: stating what it is we value, even as individuals (let alone collectively), in a sufficiently clear and operationalizable form that it could actually be implemented, in a sufficiently consistent form that we would want it implemented, is an extremely difficult problem. I have yet to see anyone come close to solving it; in my experience the world divides neatly into people who don’t think about it at all, people who think they’ve solved it and are wrong, and people who know they haven’t solved it.
If some entity (an FAI or whatever) somehow successfully implemented a collective solution it would be far more than interesting, it would fundamentally and irrevocably change the world.
I infer from my reading of your tone that you disagree with me here; the impression I get is that you consider the fact that we haven’t agreed on a solution to demonstrate our inadequacies as problem solvers, even by human standards, but that you’re too polite to say so explicitly. Am I wrong?
We actually agree on the difficulty of the problem. I think it’s very difficult to state what it is that we want AND that if we did so we’d find that individual utility functions contradict each other.
Moreover, I’m saying that maximizing Phil Goetz’s utility function or yours and everybody you love (or even my own selfish desires and wants plus those of everyone I love) COULD in effect be an unfriendly AI because MANY others would have theirs minimized.
So I’m saying that I think a friendly AI has to have it’s goals defined as: Choice A. the maximum number of people have their utility functions improved (rather than maximized) even if some minimized number of people have their utility functions worsened as opposed to Choice B. a small number having their utility functions maximized as opposed to a large number of people having their utility functions decreased (or zeroed out).
As a side note: I find it amusing that it’s so difficult to even understand each others basic axioms never mind agree on the details of what maximizing the utility function for all of us as a whole means.
To be clear: I don’t know what the details are of maximizing the utility function for all of humanity. I just think that a fair maximization of the utility function for everyone has an interesting corrollary: In order to maximize the function for everyone, some will have their individual utility functions decreased unless we accept a much narrower definition of friendly meaning “friendly to me” in which case as far as I’m concerned that no longer means friendly.
The logical tautology here is of course that those who consider “friendly to me” as being the only possible definition of friendly would consider an AI that maximized the average utility function of humanity and they themselves lost out, to be an UNfriendly AI.
Couple of things:
If you want to facilitate communication, I recommend that you stop using the word “friendly” in this context on this site. There’s a lot of talk on this site of “Friendly AI”, by which is meant something relatively specific. You are using “friendly” in the more general sense implied by the English word. This is likely to cause rather a lot of confusion.
You’re right that if strategy 1 optimizes for good stuff happening to everyone I care and strategy 2 optimizes for good stuff happening to everyone whether I care about them or not, then strategy 1 will (if done sufficiently powerfully) result in people I don’t care about having good stuff taken away from them, and strategy 2 will result in everyone I care about getting less good stuff than strategy 1 will.
You seem to be saying that I therefore ought to prefer that strategy 2 be implemented, rather than strategy 1. Is that right?
You seem to be saying that you yourself prefer that strategy 2 be implemented, rather than strategy 1. Is that right?
Fair enough. I will read the wiki.
Yes
Not saying anything about your preferences.
Nope, I’m saying strategy 2 is better for humanity. Of course personally I’d prefer strategy 1 but I’m honest enough with myself to know that certain individuals would find their utility functions severely degraded if I had an all powerful AI working for me and if I don’t trust myself to be in charge then I don’t trust any other human unless it’s someone like Ghandi.
It’s not as clear as you think it is. I’m not familiar with any common definition of “utility” that unambiguously means “the satisfaction of needs”, nor was I able to locate one in a dictionary.
“Utility” is used hereabouts as a numerical value assigned to outcomes such that outcomes with higher utilities are always preferred to outcomes with lower utilities. See Wiki:Utility function.
Nor am I familiar with “sophist” used as an adjective.
Utility is generally meant to be “economic utility” in most discussions I take part in notwithstanding the definition you’re espousing for hereabouts.
I believe that the definition of utility you’re giving is far too open and could all too easily lead to smiley world.
It is very common to use nouns as adjectives where no distinct adjective already exists and thus saying someone is “sophist” is perfectly acceptable English usage.
Yeah, that doesn’t quite nail it down either. Note Wiktionary:utility (3):
It ambiguously allows both ‘needs’ and ‘wants’, as well as ambiguous ‘satisfaction experienced’.
The only consistent, formal definition of utility I’ve seen used in economics (or game theory) is the one I gave above. If it was clear someone was not using that definition, I might assume they were using it as more generic “preference satisfaction”, or John Stuart Mill’s difficult-to-formalize-coherently “pleasure minus pain”, or the colloquial vague “usefulness” (whence “utilitarian” is colloquially a synonym for “pragmatic”).
Do you have a source defining utility clearly and unambiguously as “the satisfaction of needs”?
No you’re right it doesn’t nail it down precisely (the satisfaction of needs or wants).
I do believe, however, that it more precisely nails it down than the wiki on here.
Or on second thoughts maybe not because we again come back to conflicting utilities: a suicidal might value being killed as higher utility than someone who is sitting on death row and doesn’t want to die.
And I was using the term utility from economics since it’s the only place I’ve heard where they use “utility function” so I naturally assumed that’s what you were talking about since even if we disagree around the edges the meanings still fit the context for the purposes of this discussion.
The question is whether the PhilGoetz utility function, or the average human utility function, are better. Assume both are implemented in AIs of equal power. What makes the average human utility function “friendlier”? It would have you outlaw homosexuality and sex before marriage, remove all environmental protection laws, make child abuse and wife abuse legal, take away legal rights from women, give wedgies to smart people, etc.
I don’t think you understand utility functions.
“The question is whether the PhilGoetz utility function, or the average human utility function, are better. ”
That is indeed the question. But I think you’ve framed and stacked the the deck here with your description of what you believe the average human utility function is in order to attempt to take the moral high ground rather than arguing against my point which is this:
How do you maximize the preferred utility function for everyone instead of just a small group?