Are you proposing to build FAI based only on people’s revealed preferences? I’m not saying that’s a bad idea, but note that most of our noble-sounding goals disagree with our revealed preferences.
Approval or disapproval of certain behaviors or certain algorithms for extrapolation of preference can also be a kind of decision. And not all behavior follows to any significant extent from decision making, in the sense of following a consequentialist loop (from dependence of utility on action, to action). Finding goals in their decision making role requires considering instances of decision making, not just of behavior.
You could certainly do that, but the problem still stands, I think.
The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas. We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest, with no option of saying “sorry, your question is confused”. In this case the answers are clearly garbage. What makes you convinced that asking the algorithm about human preferences won’t result in garbage as well?
The goal of extrapolating preferences is to answer questions like “is outcome
X better or worse than outcome Y?” … We want to always obtain a definite
answer, with no option of saying “sorry, your question is confused”.
I distinguish the stage where a formal goal definition is formulated. So
elicitation/extrapolation of preferences is part of the goal definition, while
judgments* are made according to a decision algorithm that uses that
goal definition.
Your FAI might use revealed preferences of humans over extrapolation
algorithms, or all sorts of other clever ideas.
This was meant as an example to break the connotations of “revealed
preferences” as summary of tendencies in real-world behavior. The idea I was
describing was to take all sorts of simple hypothetical events associated with
humans, including their reflection on various abstract problems (which is not
particularly “real world” in the way the phrase “revealed preferences” suggests), and to find
a formal goal definition that in some sense holds the most explanatory power in
explaining these events in terms of abstract consequentialist decisions about
these events (with that goal).
But such powerful methods could also be used to obtain yes/no answers to
questions about trees falling in the forest
I don’t think so. I’m talking about taking events, such as pressing certain
buttons on keyboard, and trying to explain them as consequentialist decisions
(“Which goal does pressing the buttons this way optimize?”). This won’t work
with just a few actions, so I don’t see how to apply it to individual utterances
about trees, and what use would a goal fitted to that behavior would be in
resolving the meaning of words.
[*] Or rather decisions: I’m not sure the notion of “outcome” or even “state
of the world” can be fixed in this context. By analogy, output of a program is
an abstract property of its source code, and this output (property of the
source code) can sometimes be controlled without controlling the source code
itself. If we fix a notion of the state of the world, maybe some of the world’s
important abstract properties can be controlled without controlling its state.
If that is the case, it’s wrong to define a utility function over possible
states of the world, since it’d miss the distinctions between different
hypothetical abstract properties of the same state of the world.
a near FAI (revealed preference): everyone loudly complains about conditions while enjoying themselves immensely.
a far FAI (stated preference): everyone loudly proclaims our great success while being miserable.
Are you proposing to build FAI based only on people’s revealed preferences? I’m not saying that’s a bad idea, but note that most of our noble-sounding goals disagree with our revealed preferences.
Approval or disapproval of certain behaviors or certain algorithms for extrapolation of preference can also be a kind of decision. And not all behavior follows to any significant extent from decision making, in the sense of following a consequentialist loop (from dependence of utility on action, to action). Finding goals in their decision making role requires considering instances of decision making, not just of behavior.
You could certainly do that, but the problem still stands, I think.
The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas. We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest, with no option of saying “sorry, your question is confused”. In this case the answers are clearly garbage. What makes you convinced that asking the algorithm about human preferences won’t result in garbage as well?
I distinguish the stage where a formal goal definition is formulated. So elicitation/extrapolation of preferences is part of the goal definition, while judgments* are made according to a decision algorithm that uses that goal definition.
This was meant as an example to break the connotations of “revealed preferences” as summary of tendencies in real-world behavior. The idea I was describing was to take all sorts of simple hypothetical events associated with humans, including their reflection on various abstract problems (which is not particularly “real world” in the way the phrase “revealed preferences” suggests), and to find a formal goal definition that in some sense holds the most explanatory power in explaining these events in terms of abstract consequentialist decisions about these events (with that goal).
I don’t think so. I’m talking about taking events, such as pressing certain buttons on keyboard, and trying to explain them as consequentialist decisions (“Which goal does pressing the buttons this way optimize?”). This won’t work with just a few actions, so I don’t see how to apply it to individual utterances about trees, and what use would a goal fitted to that behavior would be in resolving the meaning of words.
[*] Or rather decisions: I’m not sure the notion of “outcome” or even “state of the world” can be fixed in this context. By analogy, output of a program is an abstract property of its source code, and this output (property of the source code) can sometimes be controlled without controlling the source code itself. If we fix a notion of the state of the world, maybe some of the world’s important abstract properties can be controlled without controlling its state. If that is the case, it’s wrong to define a utility function over possible states of the world, since it’d miss the distinctions between different hypothetical abstract properties of the same state of the world.
a near FAI (revealed preference): everyone loudly complains about conditions while enjoying themselves immensely. a far FAI (stated preference): everyone loudly proclaims our great success while being miserable.