Intuitively, it seems easier to determine if a given act violates the rule “do not lie” than the rule “maximize the expected average utility of population x”. Doesn’t this mean that I understand the first rule better than the second?
Yes, but you’re a human, not an AI. Your brain comes factory-equipped with lots of machinery for understanding deontological injunctions, and no (specific) machinery for understanding the concept of expected utility maximization.
Programming each of those concepts into an AI and conveying them to a human are entirely different tasks.
Logical uncertainty, which is unavoidable no matter how smart you are, blurs the line. AI won’t “understand” expected utility maximization completely either, it won’t see all the implications no matter how much computational resources it has. And so it needs more heuristics to guide its decisions where it can’t figure out all the implications. Those are the counterparts of deontological injunctions, although of course they must be subject to revision on sufficient reflection (and what “sufficient” means is one of these injunctions, also subject to revision). Some of then will even have normative implications, in fact that’s once reason preference is not utility function.
That said, it’s hard to reason about what preferences/morality/meta-ethics/etc. an AI actually converges to if you give it vague deontological injunctions like “be nice” or “produce paperclips”. It’d be really cool if more people were thinking about likely attractors on top of or instead of the recognized universal AI drives.
(Also I’ll note that I agree with Nesov that logical uncertainty / the grounding problem / no low level language etc. problems pose similar difficulties to the ‘you can’t just do ethical injunctions’ problem. That said, humans are able to do moral reasoning somehow, so it can’t be crazy difficult.)
“Machinery” was a figure of speech, I’m not saying we’re going to find a deontology lobe. I was referring, for instance, to the point that there are evolutionaryreasons why we’d expect to find (as we do) that an understanding of deontological injunctions is fairly universal among humans.
How do you understand the concept of expected utility maximization? Is it not through the highly general machinery of your cortex?
Oops, sorry, I accidentally used the opposite of the word I meant. That should have been “specific”, not “general”. Yes, we understand expected utility maximization with highly general machinery, and in very abstract terms.
I was referring, for instance, to the point that there are evolutionary reasons why we’d expect to find (as we do) that an understanding of deontological injunctions is fairly universal among humans.
EY’s theory linked in the 1st post that deontological injunctions evolved as some sort of additional defense against black swan events does not appear especially convincing to me. The cortex is intrinsically predictive consequentialist at a low level, but simple deontological rules are vast computational shortcuts.
An animal brain learns the hard way, the way AIXI does, thoroughly consequentialist at first, but once predictable pattern matches are learned at higher levels they can be sometimes simplified down to simpler rules for quick decisions.
Even non-verbal animals find ways to pass down some knowledge to their offspring, but in humans this is vastly amplified through language.
Every time a parent tells a child what to do, the parent is transmitting complex consequentualist results down to the younger mind in the form of simpler cached deontological behaviors. Ex: It would be painful for the child to learn a firsthand consequentualist account of why stealing is detrimental (the tribe will punish you).
Once this machinery was in place, it could extend over generations and develop into more complex cultural and religious deontologies. All of this can be accomplished through cortical reinforcement learning as the child develops.
Feral children, for all intents and purposes, act like feral animals. Human minds are cultural/linguistic software phenomena.
Not to mention that conveying a concept to a human carries no instructions; programming concepts into an AI is all instructions
I’m not aware of any practical approach to AI which consists of programming concepts directly into an AI. All modern approaches program only the equivalent of an empty brain, the concepts and resulting mind forms through learning.
Humans concepts are expressed in natural language, and for an AGI to compete with humans it will need to learn extant human knowledge. Learning natural language thus seems like the most practical approach.
“Expected utility maximisation” is, by definition what actually represents our best outcome. To the extent that it doesn’t, it is a failure of our ability to grasp and apply the concept, not a failure in the concept itself.
The problem is this: if we define an algorithm to represent our best outcome and use that as the standard of rationality, and the algorithm’s predictions then differ significantly from actual human decisions: is it a problem with the algorithm or the human mind?
If we had an algorithm that represented a human mind perfectly, then that mind would always be rational by that definition.
Even if deontological injunctions are only transmitted through language, they are based on human predispositions (read brain wiring) to act morally and cooperate, which has evolved.
This somewhat applies to animals too, there’s been research on altruism in animals.
That he makes assumptions is no point against him; the question is do those assumptions hold.
To support the first one: the popularity and success of the fallacy of appealing to authority, Milgram’s comments on his experiment, the “hole-shaped God” theory (well supported).
For the second one: First, it’s not entirely clear we do understand expected utility maximisation. Certainly, I know of no-one who acts as though they are maximising their expected utility. Second, to the extent that we do understand it, I would draw the metaphor of a Turing tarpit—I would say that we understand it only in the sense that we can hack together a bunch of neural processes that do other things, in such a way that they produce the words “expected utility maximisation” and the concept “act to get the most of what you really want”. This is still an understanding, of course, but in no way do we have machinery for that purpose like how we have machinery for orders from authority / deontological injunctions.
“Expected utility maximisation” is, by definition what actually represents our best outcome. To the extent that it doesn’t, it is a failure of our ability to grasp and apply the concept, not a failure in the concept itself.
As for the third, and for your claim of debatable: Yes, you could debate it. You would have to stand on some very wide definitions of entirely and different, and you’d lose the debate. For example: speaking aloud to an AI and speaking aloud to a human are entirely different tasks. Not to mention that conveying a concept to a human carries no instructions; programming concepts into an AI is all instructions. Another entire difference.
“Expected utility maximisation” is, by definition what actually represents our best outcome.
No, it’s based on certain axioms that are not unbreakable in strange contexts, which in turn assume a certain conceptual framework (where you can, say, enumerate possibilities in a certain way).
There’s no point in assuming completeness, being able to compare events that you won’t be choosing between (in the context of utility function having possible worlds as domain). Updateless analysis says that you never actually choose between observational events. And there are only so many counterfactuals to consider (which in this setting are more about high-level logical properties of a fixed collection of worlds, which lead to their different utility, and not presence/absence of any given possible world, so in one sense even counterfactuals don’t give you nontrivial events).
There’s no point in assuming completeness, being able to compare events that you won’t be choosing between (in the context of utility function having possible worlds as domain).
Is there ever actually a two events for which this would not hold if you did need to make such a choice?
Updateless analysis says that you never actually choose between observational events.
I’m not sure what you mean. Outcomes do not have to be observed in order to be chosen between.
And there are only so many counterfactuals to consider (which in this setting are more about high-level logical properties of a fixed collection of worlds, which lead to their different utility, and not presence/absence of any given possible world, so in one sense even counterfactuals don’t give you nontrivial events).
Isn’t this just seperating degrees of freedom and assuming that some don’t affect others? It can be derived from the utility axioms.
It is very hard to get an AI to understand the relevant deontological rules. Once you have accomplished that, there is no obvious next step easier and safer than CEV.
Intuitively, it seems easier to determine if a given act violates the rule “do not lie” than the rule “maximize the expected average utility of population x”. Doesn’t this mean that I understand the first rule better than the second?
Yes, but you’re a human, not an AI. Your brain comes factory-equipped with lots of machinery for understanding deontological injunctions, and no (specific) machinery for understanding the concept of expected utility maximization.
Programming each of those concepts into an AI and conveying them to a human are entirely different tasks.
Logical uncertainty, which is unavoidable no matter how smart you are, blurs the line. AI won’t “understand” expected utility maximization completely either, it won’t see all the implications no matter how much computational resources it has. And so it needs more heuristics to guide its decisions where it can’t figure out all the implications. Those are the counterparts of deontological injunctions, although of course they must be subject to revision on sufficient reflection (and what “sufficient” means is one of these injunctions, also subject to revision). Some of then will even have normative implications, in fact that’s once reason preference is not utility function.
That said, it’s hard to reason about what preferences/morality/meta-ethics/etc. an AI actually converges to if you give it vague deontological injunctions like “be nice” or “produce paperclips”. It’d be really cool if more people were thinking about likely attractors on top of or instead of the recognized universal AI drives.
(Also I’ll note that I agree with Nesov that logical uncertainty / the grounding problem / no low level language etc. problems pose similar difficulties to the ‘you can’t just do ethical injunctions’ problem. That said, humans are able to do moral reasoning somehow, so it can’t be crazy difficult.)
You are making a huge number of assumptions here:
Such as? Where is this machinery?
How do you understand the concept of expected utility maximization? Is it not through the highly general machinery of your cortex?
And how can we expect that the algorithm of “expected utility maximization” actually represents our best outcome?
debatable
“Machinery” was a figure of speech, I’m not saying we’re going to find a deontology lobe. I was referring, for instance, to the point that there are evolutionary reasons why we’d expect to find (as we do) that an understanding of deontological injunctions is fairly universal among humans.
Oops, sorry, I accidentally used the opposite of the word I meant. That should have been “specific”, not “general”. Yes, we understand expected utility maximization with highly general machinery, and in very abstract terms.
EY’s theory linked in the 1st post that deontological injunctions evolved as some sort of additional defense against black swan events does not appear especially convincing to me. The cortex is intrinsically predictive consequentialist at a low level, but simple deontological rules are vast computational shortcuts.
An animal brain learns the hard way, the way AIXI does, thoroughly consequentialist at first, but once predictable pattern matches are learned at higher levels they can be sometimes simplified down to simpler rules for quick decisions.
Even non-verbal animals find ways to pass down some knowledge to their offspring, but in humans this is vastly amplified through language.
Every time a parent tells a child what to do, the parent is transmitting complex consequentualist results down to the younger mind in the form of simpler cached deontological behaviors. Ex: It would be painful for the child to learn a firsthand consequentualist account of why stealing is detrimental (the tribe will punish you).
Once this machinery was in place, it could extend over generations and develop into more complex cultural and religious deontologies. All of this can be accomplished through cortical reinforcement learning as the child develops.
Feral children, for all intents and purposes, act like feral animals. Human minds are cultural/linguistic software phenomena.
I’m not aware of any practical approach to AI which consists of programming concepts directly into an AI. All modern approaches program only the equivalent of an empty brain, the concepts and resulting mind forms through learning.
Humans concepts are expressed in natural language, and for an AGI to compete with humans it will need to learn extant human knowledge. Learning natural language thus seems like the most practical approach.
The problem is this: if we define an algorithm to represent our best outcome and use that as the standard of rationality, and the algorithm’s predictions then differ significantly from actual human decisions: is it a problem with the algorithm or the human mind?
If we had an algorithm that represented a human mind perfectly, then that mind would always be rational by that definition.
Even if deontological injunctions are only transmitted through language, they are based on human predispositions (read brain wiring) to act morally and cooperate, which has evolved.
This somewhat applies to animals too, there’s been research on altruism in animals.
That he makes assumptions is no point against him; the question is do those assumptions hold.
To support the first one: the popularity and success of the fallacy of appealing to authority, Milgram’s comments on his experiment, the “hole-shaped God” theory (well supported).
For the second one: First, it’s not entirely clear we do understand expected utility maximisation. Certainly, I know of no-one who acts as though they are maximising their expected utility. Second, to the extent that we do understand it, I would draw the metaphor of a Turing tarpit—I would say that we understand it only in the sense that we can hack together a bunch of neural processes that do other things, in such a way that they produce the words “expected utility maximisation” and the concept “act to get the most of what you really want”. This is still an understanding, of course, but in no way do we have machinery for that purpose like how we have machinery for orders from authority / deontological injunctions.
“Expected utility maximisation” is, by definition what actually represents our best outcome. To the extent that it doesn’t, it is a failure of our ability to grasp and apply the concept, not a failure in the concept itself.
As for the third, and for your claim of debatable: Yes, you could debate it. You would have to stand on some very wide definitions of entirely and different, and you’d lose the debate. For example: speaking aloud to an AI and speaking aloud to a human are entirely different tasks. Not to mention that conveying a concept to a human carries no instructions; programming concepts into an AI is all instructions. Another entire difference.
No, it’s based on certain axioms that are not unbreakable in strange contexts, which in turn assume a certain conceptual framework (where you can, say, enumerate possibilities in a certain way).
Name one exception to any axiom other than the third or to the general conceptual framework.
There’s no point in assuming completeness, being able to compare events that you won’t be choosing between (in the context of utility function having possible worlds as domain). Updateless analysis says that you never actually choose between observational events. And there are only so many counterfactuals to consider (which in this setting are more about high-level logical properties of a fixed collection of worlds, which lead to their different utility, and not presence/absence of any given possible world, so in one sense even counterfactuals don’t give you nontrivial events).
Is there ever actually a two events for which this would not hold if you did need to make such a choice?
I’m not sure what you mean. Outcomes do not have to be observed in order to be chosen between.
Isn’t this just seperating degrees of freedom and assuming that some don’t affect others? It can be derived from the utility axioms.