Could you clarify your remarks? This seems to be a source of persistent mild disagreement, but I’m not really sure what it is. I am aware of some inferential distance between us on what seem to me to be technical aspects of decision theory, but your comments either require some misunderstanding or some other not-yet-identified inferential chasm.
In the future I expect to feel basically the same way about this writing, particularly the stuff in the category “Formal Definitions,” as I do today about theseposts: not safe for use, but important for someone to think about and describe, if only to see more exactly why they are dangerous approaches. I expect the formal assertions I’ve made, to the extent I’ve made formal assertions, to continue to look reasonable. I am open to the possibility that there may be surprises.
(For example, when making those old LW posts I didn’t yet understand exactly how weird TDT agents’ behavior might look to someone used to thinking in terms of CDT, so while the cryptographic boxing stuff still holds up fine the manipulation of boxed AIs doesn’t; the new work can be expressed with much less wiggle room, but there may still be surprises. )
Could you clarify your remarks? This seems to be a source of persistent mild disagreement, but I’m not really sure what it is. I am aware of some inferential distance between us on what seem to me to be technical aspects of decision theory, but your comments either require some misunderstanding or some other not-yet-identified inferential chasm.
In the future I expect to feel basically the same way about this writing, particularly the stuff in the category “Formal Definitions,” as I do today about these posts: not safe for use, but important for someone to think about and describe, if only to see more exactly why they are dangerous approaches. I expect the formal assertions I’ve made, to the extent I’ve made formal assertions, to continue to look reasonable. I am open to the possibility that there may be surprises.
(For example, when making those old LW posts I didn’t yet understand exactly how weird TDT agents’ behavior might look to someone used to thinking in terms of CDT, so while the cryptographic boxing stuff still holds up fine the manipulation of boxed AIs doesn’t; the new work can be expressed with much less wiggle room, but there may still be surprises. )