I must confess I do not understand what you just said at all. Specifically:
the second sentence: could you please expand on that?
I think I get that the function does not evaluate itself at all, and if you ask it just says “it’s just good ’cos it is, all right?”
Why is this a feature? (I suspect the password is “Löb’s theorem”, and only almost understand why.)
The last bit appears to be what I meant by “therefore it deserves to control the entire future.” It strikes me as insufficient reason to conclude that this can in no way be improved, ever.
Does the sequence show a map of how to build metamorality from the ground up, much as writing the friendly AI will need to work from the ground up?
the second sentence: could you please expand on that?
I’ll try: any claim that a fundamental/terminal moral goal ‘is good’ reduces to a tautology on this view, because “good” doesn’t have anything to it besides these goals. The speaker’s definition of goodness makes every true claim of this kind true by definition. (Though the more practical statements involve inference. I started to say it must be all logical inference, realized EY could not possibly have said that, and confirmed that in fact he did not.)
I get that the function does not evaluate itself at all,
Though technically it may see the act of caring about goodness as good. So I have to qualify what I said before that way.
Why is this a feature?
Because if the function could look at the mechanical, causal steps it takes, and declare them perfectly reliable, it would lead to a flat self-contradiction by Lob’s Theorem. The other way looks like a contradiction but isn’t. (We think.)
Though technically it may see the act of caring about goodness as good. So I have to qualify what I said before that way.
Ooh yeah, didn’t spot that one. (As someone who spent a lot of time when younger thinking about this and trying to be a good person, I certainly should have spotted this.)
I must confess I do not understand what you just said at all. Specifically:
the second sentence: could you please expand on that?
I think I get that the function does not evaluate itself at all, and if you ask it just says “it’s just good ’cos it is, all right?”
Why is this a feature? (I suspect the password is “Löb’s theorem”, and only almost understand why.)
The last bit appears to be what I meant by “therefore it deserves to control the entire future.” It strikes me as insufficient reason to conclude that this can in no way be improved, ever.
Does the sequence show a map of how to build metamorality from the ground up, much as writing the friendly AI will need to work from the ground up?
I’ll try: any claim that a fundamental/terminal moral goal ‘is good’ reduces to a tautology on this view, because “good” doesn’t have anything to it besides these goals. The speaker’s definition of goodness makes every true claim of this kind true by definition. (Though the more practical statements involve inference. I started to say it must be all logical inference, realized EY could not possibly have said that, and confirmed that in fact he did not.)
Though technically it may see the act of caring about goodness as good. So I have to qualify what I said before that way.
Because if the function could look at the mechanical, causal steps it takes, and declare them perfectly reliable, it would lead to a flat self-contradiction by Lob’s Theorem. The other way looks like a contradiction but isn’t. (We think.)
Thank you, this helps a lot.
Ooh yeah, didn’t spot that one. (As someone who spent a lot of time when younger thinking about this and trying to be a good person, I certainly should have spotted this.)