I sure hope you have a tendency to eventually converge to something that makes sense to me… Do you agree that what you post there is the product of an “initial exploration” phase that would get significantly revised and mostly discarded on the scale of months? (I had a blog just 1.5 years ago that I currently see this way, but didn’t at the time...)
Have you seen Paul’s latest post yet? It seems much more well formed than his previous posts on the subject.
I left a comment there, but it’s still under moderation, so I’ll copy it here.
For example, if we suppose that the U-maximizer can carry out any reasoning that we can carry out, then the U-maximizer knows to avoid anything which we suspect would be bad according to U (for example, torturing humans).
This seems like a problematic part of the argument. The reason we think torturing humans would be bad according to U is that we have an informal model of humans in our mind, and we know that U is actually a simulation of something that contains a human. Our “suspicion” does not come from studying U as a mathematical object, which is presumably all that a U-maximizer would do, since all it has is a formal definition of U and not our informal knowledge of it.
Have you seen Paul’s latest post yet? It seems much more well formed than his previous posts on the subject.
I agree, though it doesn’t go as far afield as many of the other posts. It’s actually another plausible winning scenario that I forgot about in the recent discussions: implement WBE via AGI (as opposed to normal engineering route, thus winning the WBE race), and then solve the remaining problems from within. Might be possible to implement when the FAI puzzle is not yet solved completely.
Could you clarify your remarks? This seems to be a source of persistent mild disagreement, but I’m not really sure what it is. I am aware of some inferential distance between us on what seem to me to be technical aspects of decision theory, but your comments either require some misunderstanding or some other not-yet-identified inferential chasm.
In the future I expect to feel basically the same way about this writing, particularly the stuff in the category “Formal Definitions,” as I do today about theseposts: not safe for use, but important for someone to think about and describe, if only to see more exactly why they are dangerous approaches. I expect the formal assertions I’ve made, to the extent I’ve made formal assertions, to continue to look reasonable. I am open to the possibility that there may be surprises.
(For example, when making those old LW posts I didn’t yet understand exactly how weird TDT agents’ behavior might look to someone used to thinking in terms of CDT, so while the cryptographic boxing stuff still holds up fine the manipulation of boxed AIs doesn’t; the new work can be expressed with much less wiggle room, but there may still be surprises. )
I sure hope you have a tendency to eventually converge to something that makes sense to me… Do you agree that what you post there is the product of an “initial exploration” phase that would get significantly revised and mostly discarded on the scale of months? (I had a blog just 1.5 years ago that I currently see this way, but didn’t at the time...)
Have you seen Paul’s latest post yet? It seems much more well formed than his previous posts on the subject.
I left a comment there, but it’s still under moderation, so I’ll copy it here.
This seems like a problematic part of the argument. The reason we think torturing humans would be bad according to U is that we have an informal model of humans in our mind, and we know that U is actually a simulation of something that contains a human. Our “suspicion” does not come from studying U as a mathematical object, which is presumably all that a U-maximizer would do, since all it has is a formal definition of U and not our informal knowledge of it.
I agree, though it doesn’t go as far afield as many of the other posts. It’s actually another plausible winning scenario that I forgot about in the recent discussions: implement WBE via AGI (as opposed to normal engineering route, thus winning the WBE race), and then solve the remaining problems from within. Might be possible to implement when the FAI puzzle is not yet solved completely.
Could you clarify your remarks? This seems to be a source of persistent mild disagreement, but I’m not really sure what it is. I am aware of some inferential distance between us on what seem to me to be technical aspects of decision theory, but your comments either require some misunderstanding or some other not-yet-identified inferential chasm.
In the future I expect to feel basically the same way about this writing, particularly the stuff in the category “Formal Definitions,” as I do today about these posts: not safe for use, but important for someone to think about and describe, if only to see more exactly why they are dangerous approaches. I expect the formal assertions I’ve made, to the extent I’ve made formal assertions, to continue to look reasonable. I am open to the possibility that there may be surprises.
(For example, when making those old LW posts I didn’t yet understand exactly how weird TDT agents’ behavior might look to someone used to thinking in terms of CDT, so while the cryptographic boxing stuff still holds up fine the manipulation of boxed AIs doesn’t; the new work can be expressed with much less wiggle room, but there may still be surprises. )