Yeah, it’s weird that Eliezer’s metaethics and FAI seem to rely on figuring out “true meanings” of certain words, when Eliezer also wrote a whole sequence explaining that words don’t have “true meanings”.
For example, Eliezer’s metaethical approach (if it worked) could be used to actually answer questions like “if a tree falls in the forest and no one’s there, does it make a sound?”, not just declare them meaningless :-) Namely, it would say that “sound” is not a confused jumble of “vibrations of air” and “auditory experiences”, but a coherent concept that you can extrapolate by examining lots of human brains. Funny I didn’t notice this tension until now.
I’ve argued before that CEV is just a generic method for solving confusing problems (simulate a bunch of smart and self-improving people and ask them what the answers are), and the concept (as opposed to the actual running of it) offers no specific insights into the nature of morality.
In the case of “if a tree falls in the forest and no one’s there, does it make a sound?”, “extrapolating” would work pretty well, I think. The extrapolation could start with someone totally confused about what sound is (e.g., “it’s something that God created to let me hear things”), and then move on a confused jumble of “vibrations of air” and “auditory experiences”, and then to the understanding that by “sound” people sometimes mean “vibrations” and sometimes “experiences” and sometimes are just confused.
ETA: I agree with Chris it’s not clear what the connection between your comment and the post is. Can you explain?
I admit the connection is pretty vague. Chris mentioned “skill at understanding humans”, that made me recall Eliezer’s sequence on words, and something just clicked I guess. Sorry for derailing the discussion.
Are you proposing to build FAI based only on people’s revealed preferences? I’m not saying that’s a bad idea, but note that most of our noble-sounding goals disagree with our revealed preferences.
Approval or disapproval of certain behaviors or certain algorithms for extrapolation of preference can also be a kind of decision. And not all behavior follows to any significant extent from decision making, in the sense of following a consequentialist loop (from dependence of utility on action, to action). Finding goals in their decision making role requires considering instances of decision making, not just of behavior.
You could certainly do that, but the problem still stands, I think.
The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas. We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest, with no option of saying “sorry, your question is confused”. In this case the answers are clearly garbage. What makes you convinced that asking the algorithm about human preferences won’t result in garbage as well?
The goal of extrapolating preferences is to answer questions like “is outcome
X better or worse than outcome Y?” … We want to always obtain a definite
answer, with no option of saying “sorry, your question is confused”.
I distinguish the stage where a formal goal definition is formulated. So
elicitation/extrapolation of preferences is part of the goal definition, while
judgments* are made according to a decision algorithm that uses that
goal definition.
Your FAI might use revealed preferences of humans over extrapolation
algorithms, or all sorts of other clever ideas.
This was meant as an example to break the connotations of “revealed
preferences” as summary of tendencies in real-world behavior. The idea I was
describing was to take all sorts of simple hypothetical events associated with
humans, including their reflection on various abstract problems (which is not
particularly “real world” in the way the phrase “revealed preferences” suggests), and to find
a formal goal definition that in some sense holds the most explanatory power in
explaining these events in terms of abstract consequentialist decisions about
these events (with that goal).
But such powerful methods could also be used to obtain yes/no answers to
questions about trees falling in the forest
I don’t think so. I’m talking about taking events, such as pressing certain
buttons on keyboard, and trying to explain them as consequentialist decisions
(“Which goal does pressing the buttons this way optimize?”). This won’t work
with just a few actions, so I don’t see how to apply it to individual utterances
about trees, and what use would a goal fitted to that behavior would be in
resolving the meaning of words.
[*] Or rather decisions: I’m not sure the notion of “outcome” or even “state
of the world” can be fixed in this context. By analogy, output of a program is
an abstract property of its source code, and this output (property of the
source code) can sometimes be controlled without controlling the source code
itself. If we fix a notion of the state of the world, maybe some of the world’s
important abstract properties can be controlled without controlling its state.
If that is the case, it’s wrong to define a utility function over possible
states of the world, since it’d miss the distinctions between different
hypothetical abstract properties of the same state of the world.
a near FAI (revealed preference): everyone loudly complains about conditions while enjoying themselves immensely.
a far FAI (stated preference): everyone loudly proclaims our great success while being miserable.
Yeah. Just because there is no “true meaning” of the word “want” doesn’t mean there won’t be difficult questions about what we really want, once we fix a definition of “want.”
(1) This was not the point of my post.
(2) In fact I see no reason to think what you say is true.
(3) Now I’m double-questioning whether my initial post was clearly written enough.
Does is rely on true meanings of words, particularly? Why not on concepts? Individually, “vibrations of air” and “auditory experiences” can be coherent.
If it extrapolates coherently, then it’s a single concept, otherwise it’s a mixture :)
This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word “sound” appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of “sound”, and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.
Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.
Yeah, it’s weird that Eliezer’s metaethics and FAI seem to rely on figuring out “true meanings” of certain words, when Eliezer also wrote a whole sequence explaining that words don’t have “true meanings”.
For example, Eliezer’s metaethical approach (if it worked) could be used to actually answer questions like “if a tree falls in the forest and no one’s there, does it make a sound?”, not just declare them meaningless :-) Namely, it would say that “sound” is not a confused jumble of “vibrations of air” and “auditory experiences”, but a coherent concept that you can extrapolate by examining lots of human brains. Funny I didn’t notice this tension until now.
I’ve argued before that CEV is just a generic method for solving confusing problems (simulate a bunch of smart and self-improving people and ask them what the answers are), and the concept (as opposed to the actual running of it) offers no specific insights into the nature of morality.
In the case of “if a tree falls in the forest and no one’s there, does it make a sound?”, “extrapolating” would work pretty well, I think. The extrapolation could start with someone totally confused about what sound is (e.g., “it’s something that God created to let me hear things”), and then move on a confused jumble of “vibrations of air” and “auditory experiences”, and then to the understanding that by “sound” people sometimes mean “vibrations” and sometimes “experiences” and sometimes are just confused.
ETA: I agree with Chris it’s not clear what the connection between your comment and the post is. Can you explain?
I admit the connection is pretty vague. Chris mentioned “skill at understanding humans”, that made me recall Eliezer’s sequence on words, and something just clicked I guess. Sorry for derailing the discussion.
I’m guessing the decision making role is a more accurate reference to human goals than the usage of words in describing them.
Are you proposing to build FAI based only on people’s revealed preferences? I’m not saying that’s a bad idea, but note that most of our noble-sounding goals disagree with our revealed preferences.
Approval or disapproval of certain behaviors or certain algorithms for extrapolation of preference can also be a kind of decision. And not all behavior follows to any significant extent from decision making, in the sense of following a consequentialist loop (from dependence of utility on action, to action). Finding goals in their decision making role requires considering instances of decision making, not just of behavior.
You could certainly do that, but the problem still stands, I think.
The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas. We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest, with no option of saying “sorry, your question is confused”. In this case the answers are clearly garbage. What makes you convinced that asking the algorithm about human preferences won’t result in garbage as well?
I distinguish the stage where a formal goal definition is formulated. So elicitation/extrapolation of preferences is part of the goal definition, while judgments* are made according to a decision algorithm that uses that goal definition.
This was meant as an example to break the connotations of “revealed preferences” as summary of tendencies in real-world behavior. The idea I was describing was to take all sorts of simple hypothetical events associated with humans, including their reflection on various abstract problems (which is not particularly “real world” in the way the phrase “revealed preferences” suggests), and to find a formal goal definition that in some sense holds the most explanatory power in explaining these events in terms of abstract consequentialist decisions about these events (with that goal).
I don’t think so. I’m talking about taking events, such as pressing certain buttons on keyboard, and trying to explain them as consequentialist decisions (“Which goal does pressing the buttons this way optimize?”). This won’t work with just a few actions, so I don’t see how to apply it to individual utterances about trees, and what use would a goal fitted to that behavior would be in resolving the meaning of words.
[*] Or rather decisions: I’m not sure the notion of “outcome” or even “state of the world” can be fixed in this context. By analogy, output of a program is an abstract property of its source code, and this output (property of the source code) can sometimes be controlled without controlling the source code itself. If we fix a notion of the state of the world, maybe some of the world’s important abstract properties can be controlled without controlling its state. If that is the case, it’s wrong to define a utility function over possible states of the world, since it’d miss the distinctions between different hypothetical abstract properties of the same state of the world.
a near FAI (revealed preference): everyone loudly complains about conditions while enjoying themselves immensely. a far FAI (stated preference): everyone loudly proclaims our great success while being miserable.
Yeah. Just because there is no “true meaning” of the word “want” doesn’t mean there won’t be difficult questions about what we really want, once we fix a definition of “want.”
(1) This was not the point of my post. (2) In fact I see no reason to think what you say is true. (3) Now I’m double-questioning whether my initial post was clearly written enough.
Does is rely on true meanings of words, particularly? Why not on concepts? Individually, “vibrations of air” and “auditory experiences” can be coherent.
What’s the general algorithm you can use to determine if something like “sound” is a “word” or a “concept”?
If it extrapolates coherently, then it’s a single concept, otherwise it’s a mixture :)
This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word “sound” appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of “sound”, and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.
Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.