(I find new obvious things everywhere after the recent realization that any explicit consideration an agent knows is subject to whole agent’s judgment, even “preference” or “logical correctness”. This also explains a bit of our talking past each other in the other thread.)
I don’t have much idea what you mean here. This seems important enough to write up as more than a parenthetical remark.
I spent a lot of time laboring under the intuition that there’s some “preference” thingie that summarizes all we care about, that we can “extract” from (define using a reference to) people and have an AI optimize it. In the lingo of meta-ethics, that would be “right” or “morality”, and it distanced itself from the overly specific “utility” that also has the disadvantage of forgetting that prior is essential.
Then, over the last few months, as I was capitalizing on finally understanding UDT in May 2010 (despite having convinced a lot of people that I understood it long before that, I completely failed to get the essential aspect of controlling the referents of fixed definitions, and only recognized in retrospect that what I figured out by that time was actually UDT), I noticed that a decision problem requires many more essential parts than just preference, and so to specify what people care about, we need a whole human decision problem. But the intuition that linked to preference in particular, which was by then merely a part of the decision problem, still lingered, and so I failed to notice that now not preference, but the whole decision problem, is analogous to “right” and “morality” (but not quite, since that decision problem still won’t be the definition of right, it can be judged in turn), and the whole agent that implements such decision problem is the best tool available to judge them.
This agent, in particular, can find itself judging its own preference, or its own inference system, or its whole architecture that might or might not specify an explicit inference system as its part, and so on. Whatever explicit consideration it’s moved by, that is whatever module in the agent (decision problem) it considers, there’s a decision problem of self-improvement where the agent replaces that module with something else, and things other than that module can have a hand in deciding.
Also, there’s little point in distinguishing “decision problem” and “agent”, even though there is a point in distinguishing a decision problem and what’s right. Decision problem is merely a set of tricks that the agent is willing to use, as is agent’s own implementation. What that set of tricks wants to do is not specified in any of the tricks, and the tricks can well fail the agent.
When we apply these considerations to humans, it follows that no human can know what they care about, they can only guess (and, indeed, design) heuristic rules for figuring out what they care about, and the same applies to when they construct FAIs. So extracting “preference” exactly is not possible, instead FAI should be seen as a heuristic, that would still be subject to moral judgment and probably won’t capture it whole, just as humans themselves don’t implement what’s right reliably. Recognizing that FAI won’t be perfect, and that things it does are merely ways of more reliably doing the right thing, looks like an important intuition.
(This is apparently very sketchy and I don’t expect it to get significantly better for at least a few months. I could talk more (thus describing more of the intuition), but not clearer, because I don’t understand this well myself. An alternative would have me write up some unfinished work that would clarify each particular intuition, but would be likely of no lasting value, and so should wait for a better rendition instead.)
it follows that no human can know what they care about
This sounds weird, like you’ve driven off a cliff or something. A human mind is a computer of finite complexity. If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty which may or may not be reduced by applying powerful math. Or do I misunderstand you? Maybe the following two questions will help clarify things:
a) Can a paperclipper know what it cares about?
b) How is a human fundamentally different from a paperclipper with respect to (a)?
If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty.
Hence “explicit considerations”, that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty.
Can a paperclipper know what it cares about?
No, at least while it’s still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn’t disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won’t look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it.
ETA:
Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
I defy the possibility that we may “not care about logic” in the sense that you suggest.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality?
(Not “morality” here, of course, but its counterpart in the analogy.)
What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by “actually proving it”? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there’s a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what’s physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.
Consider a possible self-improvement that changes your inference system in such a way that it (1) becomes significantly more efficient at inferring the kinds of facts that help you with making right decisions, and (2) obtains an additional tiny chance of being inconsistent. If all you care about is correctness, then notice that implementing this self-improvement will make you less correct, will increase the probability that you’ll produce incorrect inferences in the future. On the other hand, expected utility of this decision argues that you should take it. This is a conflict, resolved either by self-improving or not.
That’s fair. Yes, agreed that this is a decision between maximizing my odds of being logical and maximizing my odds of being right, which is a legitimate example of the conflict you implied. And I guess I agree that if being right has high utility then it’s best to choose what’s right.
And I guess I agree that if being right has high utility then it’s best to choose what’s right.
Seeking high utility is right (and following rules of logic is right), not the other way around. “Right” is the unreachable standard by which things should be, which “utility” is merely a heuristic for representation of.
I’m generally sympathetic towards these intuitions, but I have a few reservations:
Isn’t it possible that it only looks like “heuristics all the way down” because we haven’t dug deep enough yet? Perhaps in the not too distant future, someone will come up with some insights that will make everything clear, and we can just implement that.
What is the nature of morality according to your approach? You say that a human can’t know what they care about (which I assume you use interchangeably with “right”, correct me if I’m wrong here). Is it because they can’t, in principle, fully unfold the logical definition of right, or is it that they can’t even define “right” in any precise way?
This part assumes that your answer to the last question is “the latter”. Usually when someone says “heuristic” they have a fully precise theory or problem statement that the heuristic is supposed to be an approximate solution to. How is an agent supposed to design a set of heuristics without a such a precise definition to guide it? Also, if the agent itself uses the words “morality” or “right”, what do they refer to?
If the answer to the question in 2 is “the former”, do you have any idea what the precise definition of “right” looks like?
Isn’t it possible that it only looks like “heuristics all the way down” because we haven’t dug deep enough yet?
Everything’s possible, but doesn’t seem plausible at this point, and certainly not at human level. To conclude that something is not a heuristic, but the thing itself, one would need too much certainty to be expected of such a question.
What is the nature of morality according to your approach? You say that a human can’t know what they care about (which I assume you use interchangeably with “right”, correct me if I’m wrong here).
I did use that interchangeably.
Is it because they can’t, in principle, fully unfold the logical definition of right, or is it that they can’t even define “right” in any precise way?
Both (the latter). Having an explicit definition would correspond to “preference” which I discussed in the grandparent comment. But when we talk of merely “precise”, at least in principle we could hope to obtain a significantly more precise description, maybe even on human level, which is what meta-ethics should strive to give us. Every useful heuristic is an element of such a description, and some of the heuristics, such as laws of physics, are very precise.
How is an agent supposed to design a set of heuristics without such a precise definition to guide it?
The current heuristics, its current implementation, which is understood to be fallible.
Also, if the agent itself uses the words “morality” or “right”, what do they refer to?
Don’t know (knowing would give a definition). To the extent it’s known, see the current heuristics (long list), maybe brains.
Essentially, what you’re describing is just the situation that we are actually faced with. I mean, when I use the word “right” I think I mean something but I don’t know what. And I have to use my current heuristics, my current implementation without having a precise theory to guide me.
And you’re saying that this situation is unlikely to change significantly by the time we build an FAI, so the best we can expect to do is equivalent to a group of uploads improving themselves to the best of their abilities.
I tend to agree with this (although I think I assign a higher probability that someone does make a breakthrough than you perhaps do), but it doesn’t really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.
Essentially, what you’re describing is just the situation that we are actually faced with.
I’m glad it all adds up to normality, given the amount of ink I spilled getting to this point.
And you’re saying that you don’t expect this situation to change significantly by the time we build an FAI, so the best we can do is equivalent to a group of uploads improving themselves to the best of their abilities.
Not necessarily. The uploads construct could in principle be made abstract, with efficient algorithms figuring out the result of the process much quickly than if it’s actually simulated. More specific heuristics could be figured out that make use of computational resources to make better progress, maybe on early stages by the uploads construct.
it doesn’t really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.
I’m not sure about that. If it’s indeed all we can say about morality right now, then that’s what we have to say, even if it doesn’t belong to the expected literary genre. It’s too easy to invent fake explanations, and absence of conclusions invites that, where a negative conclusion could focus the effort elsewhere.
(Also, I don’t remember particular points on which my current view disagrees with Eliezer’s sequence, although I’d need to re-read it to have a better idea, which I really should, since I only read it as it was posted, when my understanding of the area was zilch.)
I second this request. In particular, please clarify whether “preference” and “logical correctness” are presented here as examples of “explicit considerations”. And whether whole agent should be parsed as including all the sub-agents? Or perhaps as extrapolated agent?
Perhaps he’s refering to the part of CEV that says “extrapolated as we wish that extrapolated, interpreted as we wish that interpreted”. Even logical coherence becomes in this way a focus of extrapolation dynamics, and if this criterion should be changed to something else—as judged by the whole of our extrapolated morality in a strange-loopy way—well, so be it. The dynamics should reflect on itself and consider the foundational assumptions it was built upon, including the compelingness of basic logic we are currently so certain about—and of course, if it really should reflect on itself in this way.
Anyway, I’d really like to hear what Vladimir has to say about this. Even though it’s often quite hard for me to parse his writings, he does seem to clear things up for me or at least direct my attention towards some new, unexplored areas...
I don’t have much idea what you mean here. This seems important enough to write up as more than a parenthetical remark.
I spent a lot of time laboring under the intuition that there’s some “preference” thingie that summarizes all we care about, that we can “extract” from (define using a reference to) people and have an AI optimize it. In the lingo of meta-ethics, that would be “right” or “morality”, and it distanced itself from the overly specific “utility” that also has the disadvantage of forgetting that prior is essential.
Then, over the last few months, as I was capitalizing on finally understanding UDT in May 2010 (despite having convinced a lot of people that I understood it long before that, I completely failed to get the essential aspect of controlling the referents of fixed definitions, and only recognized in retrospect that what I figured out by that time was actually UDT), I noticed that a decision problem requires many more essential parts than just preference, and so to specify what people care about, we need a whole human decision problem. But the intuition that linked to preference in particular, which was by then merely a part of the decision problem, still lingered, and so I failed to notice that now not preference, but the whole decision problem, is analogous to “right” and “morality” (but not quite, since that decision problem still won’t be the definition of right, it can be judged in turn), and the whole agent that implements such decision problem is the best tool available to judge them.
This agent, in particular, can find itself judging its own preference, or its own inference system, or its whole architecture that might or might not specify an explicit inference system as its part, and so on. Whatever explicit consideration it’s moved by, that is whatever module in the agent (decision problem) it considers, there’s a decision problem of self-improvement where the agent replaces that module with something else, and things other than that module can have a hand in deciding.
Also, there’s little point in distinguishing “decision problem” and “agent”, even though there is a point in distinguishing a decision problem and what’s right. Decision problem is merely a set of tricks that the agent is willing to use, as is agent’s own implementation. What that set of tricks wants to do is not specified in any of the tricks, and the tricks can well fail the agent.
When we apply these considerations to humans, it follows that no human can know what they care about, they can only guess (and, indeed, design) heuristic rules for figuring out what they care about, and the same applies to when they construct FAIs. So extracting “preference” exactly is not possible, instead FAI should be seen as a heuristic, that would still be subject to moral judgment and probably won’t capture it whole, just as humans themselves don’t implement what’s right reliably. Recognizing that FAI won’t be perfect, and that things it does are merely ways of more reliably doing the right thing, looks like an important intuition.
(This is apparently very sketchy and I don’t expect it to get significantly better for at least a few months. I could talk more (thus describing more of the intuition), but not clearer, because I don’t understand this well myself. An alternative would have me write up some unfinished work that would clarify each particular intuition, but would be likely of no lasting value, and so should wait for a better rendition instead.)
This sounds weird, like you’ve driven off a cliff or something. A human mind is a computer of finite complexity. If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty which may or may not be reduced by applying powerful math. Or do I misunderstand you? Maybe the following two questions will help clarify things:
a) Can a paperclipper know what it cares about?
b) How is a human fundamentally different from a paperclipper with respect to (a)?
Hence “explicit considerations”, that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty.
No, at least while it’s still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn’t disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won’t look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it.
ETA:
I defy the possibility that we may “not care about logic” in the sense that you suggest.
(Not “morality” here, of course, but its counterpart in the analogy.)
What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by “actually proving it”? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there’s a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what’s physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.
In a decision between what’s logical and what’s right, you ought to choose what’s right.
If you can summarize your reasons for thinking that’s actually a conflict that can arise for me, I’d be very interested in them.
Consider a possible self-improvement that changes your inference system in such a way that it (1) becomes significantly more efficient at inferring the kinds of facts that help you with making right decisions, and (2) obtains an additional tiny chance of being inconsistent. If all you care about is correctness, then notice that implementing this self-improvement will make you less correct, will increase the probability that you’ll produce incorrect inferences in the future. On the other hand, expected utility of this decision argues that you should take it. This is a conflict, resolved either by self-improving or not.
That’s fair. Yes, agreed that this is a decision between maximizing my odds of being logical and maximizing my odds of being right, which is a legitimate example of the conflict you implied. And I guess I agree that if being right has high utility then it’s best to choose what’s right.
Thanks.
Seeking high utility is right (and following rules of logic is right), not the other way around. “Right” is the unreachable standard by which things should be, which “utility” is merely a heuristic for representation of.
It isn’t clear to me what that statement, or its negation, actually implies about the world. But I certainly don’t think it’s false.
I’m generally sympathetic towards these intuitions, but I have a few reservations:
Isn’t it possible that it only looks like “heuristics all the way down” because we haven’t dug deep enough yet? Perhaps in the not too distant future, someone will come up with some insights that will make everything clear, and we can just implement that.
What is the nature of morality according to your approach? You say that a human can’t know what they care about (which I assume you use interchangeably with “right”, correct me if I’m wrong here). Is it because they can’t, in principle, fully unfold the logical definition of right, or is it that they can’t even define “right” in any precise way?
This part assumes that your answer to the last question is “the latter”. Usually when someone says “heuristic” they have a fully precise theory or problem statement that the heuristic is supposed to be an approximate solution to. How is an agent supposed to design a set of heuristics without a such a precise definition to guide it? Also, if the agent itself uses the words “morality” or “right”, what do they refer to?
If the answer to the question in 2 is “the former”, do you have any idea what the precise definition of “right” looks like?
Everything’s possible, but doesn’t seem plausible at this point, and certainly not at human level. To conclude that something is not a heuristic, but the thing itself, one would need too much certainty to be expected of such a question.
I did use that interchangeably.
Both (the latter). Having an explicit definition would correspond to “preference” which I discussed in the grandparent comment. But when we talk of merely “precise”, at least in principle we could hope to obtain a significantly more precise description, maybe even on human level, which is what meta-ethics should strive to give us. Every useful heuristic is an element of such a description, and some of the heuristics, such as laws of physics, are very precise.
The current heuristics, its current implementation, which is understood to be fallible.
Don’t know (knowing would give a definition). To the extent it’s known, see the current heuristics (long list), maybe brains.
Essentially, what you’re describing is just the situation that we are actually faced with. I mean, when I use the word “right” I think I mean something but I don’t know what. And I have to use my current heuristics, my current implementation without having a precise theory to guide me.
And you’re saying that this situation is unlikely to change significantly by the time we build an FAI, so the best we can expect to do is equivalent to a group of uploads improving themselves to the best of their abilities.
I tend to agree with this (although I think I assign a higher probability that someone does make a breakthrough than you perhaps do), but it doesn’t really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.
I’m glad it all adds up to normality, given the amount of ink I spilled getting to this point.
Not necessarily. The uploads construct could in principle be made abstract, with efficient algorithms figuring out the result of the process much quickly than if it’s actually simulated. More specific heuristics could be figured out that make use of computational resources to make better progress, maybe on early stages by the uploads construct.
I’m not sure about that. If it’s indeed all we can say about morality right now, then that’s what we have to say, even if it doesn’t belong to the expected literary genre. It’s too easy to invent fake explanations, and absence of conclusions invites that, where a negative conclusion could focus the effort elsewhere.
(Also, I don’t remember particular points on which my current view disagrees with Eliezer’s sequence, although I’d need to re-read it to have a better idea, which I really should, since I only read it as it was posted, when my understanding of the area was zilch.)
I second this request. In particular, please clarify whether “preference” and “logical correctness” are presented here as examples of “explicit considerations”. And whether whole agent should be parsed as including all the sub-agents? Or perhaps as extrapolated agent?
Perhaps he’s refering to the part of CEV that says “extrapolated as we wish that extrapolated, interpreted as we wish that interpreted”. Even logical coherence becomes in this way a focus of extrapolation dynamics, and if this criterion should be changed to something else—as judged by the whole of our extrapolated morality in a strange-loopy way—well, so be it. The dynamics should reflect on itself and consider the foundational assumptions it was built upon, including the compelingness of basic logic we are currently so certain about—and of course, if it really should reflect on itself in this way.
Anyway, I’d really like to hear what Vladimir has to say about this. Even though it’s often quite hard for me to parse his writings, he does seem to clear things up for me or at least direct my attention towards some new, unexplored areas...