There certainly is a lot of moral prescription going on. This is mostly indirect, implicit in the kind of questions that get asked rather than directly asserted. “Expected utility” is the right thing to optimise for, almost by definition. But there is more than that at play. In particular, there tends to be an assumption that other people’s utility functions will, and in fact ‘should’ contribute to mine in a simple, sometimes specific, way. I don’t particularly respect that presumption.
Edit: Fixed the typo that cousin_it tactfully corrected in his quote.
You don’t value other people’s lives because they value their own lives. Paperclip maximizers value paperclips, but you won’t take that into account. It’s not so much contribution of other people’s utility functions that drives your decisions (or morality). You just want mostly the same things, and care about others’ well-being (which you should to an unknown extent, but which you obviously do at least somewhat).
“Expected utility” is the right thing to optimise for, almost by definition.
This isn’t clear. Preferences of any actual human seem to form a directed graph, but it’s incomplete and can contain cycles. Any way to transform it into a complete acyclic graph (any pair of situations comparable, no preference loops) must differ from the original graph somewhere. Different algorithms will destroy different facets of actual human preference, but there’s certainly no algorithm that can preserve all of it; that much we can consider already proven beyond reasonable doubt. It’s not obvious to me that there’s a single, well-defined, canonical way to perform this surgery.
And it’s not at all obvious that going from a single human to an aggregate of all humanity will mitigate the problem (see Torture vs Specks). That’s just too many leaps of faith.
I agree/upvoted your point. Human preferences are cyclic. I’d go further and say that without at least having a preference graph that is acyclic it is not possible to optimise a decision at all. The very thought seems meaningless.
Assuming one can establish coherent preferences the question of whether one should optimise for expected utility encounters a further complication. Many human preferences are refer to our actions and not outcomes. An agent could in fact decide to optimise for making ‘Right’ choices and to hell with the consequences. They could choose not to optimise for expected utility. Of course, it seems like that choice was the one with the highest expected value in their rather wacky utility function.
It’s not an observation that warrants much more than those three words and the comma but it seems to me that either you are optimising a decision for expected utility or you are doing some other thing than optimising. ‘Expected utility’ just happens to be the name given to value in the function you use if you are optimising a decision.
In the light of the correction you’ve made just now, do you retract this comment as well? (It looks to be based on the same mistake, but if you don’t think so, I’d like to argue.)
No, it’s a different point, and one I’d be happy to argue. Here I talk about encoding actual human preferences over all possible futures, not designing an algorithm that will yield one good future. For example, an algorithm that gives one good future may never actually have to worry about torture vs dust specks. So it’s not clear that we should worry about it either.
Preferences of any actual human seem to form a directed graph, but it’s incomplete and can contain cycles.
I suspect you are not talking about neurons in the brain, but have no idea what you do mean...
Any way to transform it into a complete acyclic graph (any pair of situations comparable, no preference loops) must differ from the original graph somewhere. Different algorithms will destroy different facets of actual human preference, but there’s certainly no algorithm that can preserve all of it; that much we can consider already proven beyond reasonable doubt. It’s not obvious to me that there’s a single, well-defined, canonical way to perform this surgery.
By Church-Turing thesis, you can construct an artifact behaviorally indistinguishable from a human based even on expected utility maximization (even though it’s an inadequate thing to do). Whatever you can expect of a real human, including answering hypothetical questions, you can expect from this construction.
Here I talk about encoding actual human preferences over all possible futures, not designing an algorithm that will yield one good future. For example, an algorithm that gives one good future may never actually have to worry about torture vs dust specks. So it’s not clear that we should worry about it either.
Algorithms are strategies, they are designed to work depending on observations. When you design an algorithm, you design behaviors for all possible futures. Other than giving this remark, I don’t know what to do with your comment...
Preference as order on situations? Make that order on histories, or better order on games to be provably won, but you should already know that, so again I don’t see what you are saying.
Oh, okay, on possible histories. I really don’t understand what’s unclear to you. It’s not obvious to me that there’s a unique canonical way to build a complete acyclic graph (utility-based preference) from an incomplete graph with cycles (actual human preference). Yes, expected utility optimization can mimic any behavior, but I don’t want to mimic behavior, I want to represent the data structure of preferences.
By C-T, you can represent any data, right? The utility-surrogate can have a detailed scan of a human in its virtual utility-maximizing pocket, or even run a simulation of human brain, just on a different substrate.
For histories: you argue that people have cyclic preference over world histories as well, because you consider preference to be the same thing as choice, that is prone to whim? That’s not what I mean by preference (which you should also know), but it explains your comments in this thread.
Whims are all we can observe. We disagree on whether whims can be canonically regularized into something coherent. I don’t think Eliezer knows that either (it’s kind of similar to the question whether humanity’s volition coheres). Yeah, he’s trying to regularize his whims, and you may strive for that too, but what about the rest of us?
You can consider a person as a system that gives various counterfactual reactions to interaction—most of these reactions won’t be observed in the history of what actually happened to that person in the past. While it e.g. makes sense to talk about what a person (actually) answered to a question asked in English, you are not working with concepts themselves in this setting: just as the interpretation of words is a little iffy, deeper understanding of the meaning of the words (by the person who answers the questions) is even more iffy.
What you need to talk about preference is to compare huge formal strategies or games (not even snapshots of the history of the world), while what you get in the naive settings is asking “yes/no” questions in English.
Unavailability of adequate formalization of what it means to ask the actual question about consequences doesn’t justify jumping to identification of preference with “yes/no” utterances resulting from questions obtained in unspecified manner.
I don’t see how going from yes/no questions to simulated games helps. People will still exhibit preference reversals in their actions, or just melt down.
I wasn’t proposing a solution (I wasn’t talking about simulating humans playing a game—I was referring to a formal object). The strategies that need to be compared are too big for a human to comprehend—that’s one of the problems with defining what the preference is via asking questions (or simulating humans playing games). When you construct questions about the actual consequences in the world, you are simplifying, and through this simplification lose precision. That a person can make mistakes, can be wrong, is the next step through which this process loses the original question, and a way in which you can get incoherent responses: that’s noise. It doesn’t follow from the presence of noise that noise is inherent in the signal, and it doesn’t make sense to define signal as signal with noise.
But you need at least a conceptual way to tell signal from noise. Maybe an analogy will help: do you also think that there’s an ideal Platonic market price that gets tainted by real-world “noise”?
I don’t understand market price enough to make this analogy. I don’t propose solutions, I merely say that considering noise as part of signal ignores the fact that it’s noise. There is even a strong human intuition that there are errors. If I understand that I made an error, I consider it preferable that my responses-in-error won’t be considered correct by definition.
The concept of correct answer is distinct from the concept of answer actually given. When we ask questions about preference, we are interested in correct answers, not in answers actually given. Furthermore, we are interested in correct answers to the questions that can’t physically be neither asked from nor answered by a human.
Formalizing the sense of correct answers is a big chunk of FAI, while formalizing the sense of actual answers or even counterfactual actual answers is trivial if you start from physics. It seems clear that these concepts are quite different, and the (available) formalization of the second doesn’t work for the first. Furthermore, “actual answers” also need to be interfaced with a tool that states “complete-states-of-the-world with all quarks and stuff” as human-readable questions.
Preferences of any actual human seem to form a directed graph, but it’s incomplete and can contain cycles. Any way to transform it into a complete acyclic graph (any pair of situations comparable, no preference loops) must differ from the original graph somewhere.
What graph??? An accurate account should take care of every detail. I feel you are attacking some simplistic strawman, but I’m not sure of what kind.
Do you agree that it’s possible in principle to implement an artifact behaviorally indistinguishable from a human being that runs on expected utility maximization, with sufficiently huge “utility function” and some simple prior? Well, this claim seems to be both trivial and useless, as it speaks not about improvement, just surrogate.
There certainly is a lot of moral prescription going on. This is mostly indirect, implicit in the kind of questions that get asked rather than directly asserted. “Expected utility” is the right thing to optimise for, almost by definition. But there is more than that at play. In particular, there tends to be an assumption that other people’s utility functions will, and in fact ‘should’ contribute to mine in a simple, sometimes specific, way. I don’t particularly respect that presumption.
Edit: Fixed the typo that cousin_it tactfully corrected in his quote.
You don’t value other people’s lives because they value their own lives. Paperclip maximizers value paperclips, but you won’t take that into account. It’s not so much contribution of other people’s utility functions that drives your decisions (or morality). You just want mostly the same things, and care about others’ well-being (which you should to an unknown extent, but which you obviously do at least somewhat).
I agree with that summary completely.
This isn’t clear. Preferences of any actual human seem to form a directed graph, but it’s incomplete and can contain cycles. Any way to transform it into a complete acyclic graph (any pair of situations comparable, no preference loops) must differ from the original graph somewhere. Different algorithms will destroy different facets of actual human preference, but there’s certainly no algorithm that can preserve all of it; that much we can consider already proven beyond reasonable doubt. It’s not obvious to me that there’s a single, well-defined, canonical way to perform this surgery.
And it’s not at all obvious that going from a single human to an aggregate of all humanity will mitigate the problem (see Torture vs Specks). That’s just too many leaps of faith.
I agree/upvoted your point. Human preferences are cyclic. I’d go further and say that without at least having a preference graph that is acyclic it is not possible to optimise a decision at all. The very thought seems meaningless.
Assuming one can establish coherent preferences the question of whether one should optimise for expected utility encounters a further complication. Many human preferences are refer to our actions and not outcomes. An agent could in fact decide to optimise for making ‘Right’ choices and to hell with the consequences. They could choose not to optimise for expected utility. Of course, it seems like that choice was the one with the highest expected value in their rather wacky utility function.
It’s not an observation that warrants much more than those three words and the comma but it seems to me that either you are optimising a decision for expected utility or you are doing some other thing than optimising. ‘Expected utility’ just happens to be the name given to value in the function you use if you are optimising a decision.
In the light of the correction you’ve made just now, do you retract this comment as well? (It looks to be based on the same mistake, but if you don’t think so, I’d like to argue.)
No, it’s a different point, and one I’d be happy to argue. Here I talk about encoding actual human preferences over all possible futures, not designing an algorithm that will yield one good future. For example, an algorithm that gives one good future may never actually have to worry about torture vs dust specks. So it’s not clear that we should worry about it either.
I suspect you are not talking about neurons in the brain, but have no idea what you do mean...
By Church-Turing thesis, you can construct an artifact behaviorally indistinguishable from a human based even on expected utility maximization (even though it’s an inadequate thing to do). Whatever you can expect of a real human, including answering hypothetical questions, you can expect from this construction.
Algorithms are strategies, they are designed to work depending on observations. When you design an algorithm, you design behaviors for all possible futures. Other than giving this remark, I don’t know what to do with your comment...
Nodes in the graph are hypothetical situations, and arrows are preferences.
Preference as order on situations? Make that order on histories, or better order on games to be provably won, but you should already know that, so again I don’t see what you are saying.
Oh, okay, on possible histories. I really don’t understand what’s unclear to you. It’s not obvious to me that there’s a unique canonical way to build a complete acyclic graph (utility-based preference) from an incomplete graph with cycles (actual human preference). Yes, expected utility optimization can mimic any behavior, but I don’t want to mimic behavior, I want to represent the data structure of preferences.
By C-T, you can represent any data, right? The utility-surrogate can have a detailed scan of a human in its virtual utility-maximizing pocket, or even run a simulation of human brain, just on a different substrate.
For histories: you argue that people have cyclic preference over world histories as well, because you consider preference to be the same thing as choice, that is prone to whim? That’s not what I mean by preference (which you should also know), but it explains your comments in this thread.
Whims are all we can observe. We disagree on whether whims can be canonically regularized into something coherent. I don’t think Eliezer knows that either (it’s kind of similar to the question whether humanity’s volition coheres). Yeah, he’s trying to regularize his whims, and you may strive for that too, but what about the rest of us?
You can consider a person as a system that gives various counterfactual reactions to interaction—most of these reactions won’t be observed in the history of what actually happened to that person in the past. While it e.g. makes sense to talk about what a person (actually) answered to a question asked in English, you are not working with concepts themselves in this setting: just as the interpretation of words is a little iffy, deeper understanding of the meaning of the words (by the person who answers the questions) is even more iffy.
What you need to talk about preference is to compare huge formal strategies or games (not even snapshots of the history of the world), while what you get in the naive settings is asking “yes/no” questions in English.
Unavailability of adequate formalization of what it means to ask the actual question about consequences doesn’t justify jumping to identification of preference with “yes/no” utterances resulting from questions obtained in unspecified manner.
I don’t see how going from yes/no questions to simulated games helps. People will still exhibit preference reversals in their actions, or just melt down.
I wasn’t proposing a solution (I wasn’t talking about simulating humans playing a game—I was referring to a formal object). The strategies that need to be compared are too big for a human to comprehend—that’s one of the problems with defining what the preference is via asking questions (or simulating humans playing games). When you construct questions about the actual consequences in the world, you are simplifying, and through this simplification lose precision. That a person can make mistakes, can be wrong, is the next step through which this process loses the original question, and a way in which you can get incoherent responses: that’s noise. It doesn’t follow from the presence of noise that noise is inherent in the signal, and it doesn’t make sense to define signal as signal with noise.
But you need at least a conceptual way to tell signal from noise. Maybe an analogy will help: do you also think that there’s an ideal Platonic market price that gets tainted by real-world “noise”?
I don’t understand market price enough to make this analogy. I don’t propose solutions, I merely say that considering noise as part of signal ignores the fact that it’s noise. There is even a strong human intuition that there are errors. If I understand that I made an error, I consider it preferable that my responses-in-error won’t be considered correct by definition.
The concept of correct answer is distinct from the concept of answer actually given. When we ask questions about preference, we are interested in correct answers, not in answers actually given. Furthermore, we are interested in correct answers to the questions that can’t physically be neither asked from nor answered by a human.
Formalizing the sense of correct answers is a big chunk of FAI, while formalizing the sense of actual answers or even counterfactual actual answers is trivial if you start from physics. It seems clear that these concepts are quite different, and the (available) formalization of the second doesn’t work for the first. Furthermore, “actual answers” also need to be interfaced with a tool that states “complete-states-of-the-world with all quarks and stuff” as human-readable questions.
What graph??? An accurate account should take care of every detail. I feel you are attacking some simplistic strawman, but I’m not sure of what kind.
Do you agree that it’s possible in principle to implement an artifact behaviorally indistinguishable from a human being that runs on expected utility maximization, with sufficiently huge “utility function” and some simple prior? Well, this claim seems to be both trivial and useless, as it speaks not about improvement, just surrogate.
Why? I don’t see any conclusive justification for that yet, except mathematical convenience.