If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty.
Hence “explicit considerations”, that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty.
Can a paperclipper know what it cares about?
No, at least while it’s still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn’t disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won’t look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it.
ETA:
Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
I defy the possibility that we may “not care about logic” in the sense that you suggest.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality?
(Not “morality” here, of course, but its counterpart in the analogy.)
What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by “actually proving it”? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there’s a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what’s physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.
Consider a possible self-improvement that changes your inference system in such a way that it (1) becomes significantly more efficient at inferring the kinds of facts that help you with making right decisions, and (2) obtains an additional tiny chance of being inconsistent. If all you care about is correctness, then notice that implementing this self-improvement will make you less correct, will increase the probability that you’ll produce incorrect inferences in the future. On the other hand, expected utility of this decision argues that you should take it. This is a conflict, resolved either by self-improving or not.
That’s fair. Yes, agreed that this is a decision between maximizing my odds of being logical and maximizing my odds of being right, which is a legitimate example of the conflict you implied. And I guess I agree that if being right has high utility then it’s best to choose what’s right.
And I guess I agree that if being right has high utility then it’s best to choose what’s right.
Seeking high utility is right (and following rules of logic is right), not the other way around. “Right” is the unreachable standard by which things should be, which “utility” is merely a heuristic for representation of.
Hence “explicit considerations”, that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty.
No, at least while it’s still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn’t disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won’t look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it.
ETA:
I defy the possibility that we may “not care about logic” in the sense that you suggest.
(Not “morality” here, of course, but its counterpart in the analogy.)
What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by “actually proving it”? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there’s a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what’s physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.
In a decision between what’s logical and what’s right, you ought to choose what’s right.
If you can summarize your reasons for thinking that’s actually a conflict that can arise for me, I’d be very interested in them.
Consider a possible self-improvement that changes your inference system in such a way that it (1) becomes significantly more efficient at inferring the kinds of facts that help you with making right decisions, and (2) obtains an additional tiny chance of being inconsistent. If all you care about is correctness, then notice that implementing this self-improvement will make you less correct, will increase the probability that you’ll produce incorrect inferences in the future. On the other hand, expected utility of this decision argues that you should take it. This is a conflict, resolved either by self-improving or not.
That’s fair. Yes, agreed that this is a decision between maximizing my odds of being logical and maximizing my odds of being right, which is a legitimate example of the conflict you implied. And I guess I agree that if being right has high utility then it’s best to choose what’s right.
Thanks.
Seeking high utility is right (and following rules of logic is right), not the other way around. “Right” is the unreachable standard by which things should be, which “utility” is merely a heuristic for representation of.
It isn’t clear to me what that statement, or its negation, actually implies about the world. But I certainly don’t think it’s false.