Lots of interesting points, but on your final paragraph, is a theory that models the agent as part of its environment necessarily possible? Since the model is part of the agent, it would have to include the model as part of the model. I suppose that isn’t an outright contradiction, as there are of course mathematical structures with proper parts equivalent to the whole, but does it seem likely that plausible models human agents can construct could be like that?
It seems to me that there are logical constraints on self-knowledge, related to the well-known paradoxes associated with self-reference. I further think, though it would perhaps go too far afield for a mere comment to specify in detail, that quite a number of philosophical discussions of free will have touched on this issue (though usually none too clearly). Perhaps causal decision theory is partly motivated by people thinking that no version of evidential decision theory will be able to escape gaps in the evidence generated by the limitations on self-knowledge (I could believe this was part of the motivation of David Lewis). Note that this problem doesn’t require there to be anything spooky about human choices (if the problem is a restriction on self-knowledge, humans could still be perfectly determined, and any particular human could be perfectly predicted by someone other than themselves).
Pretty sure humans normally model themselves as part of the environment. Seems a bit excessive to conjecture the impossibility of something humans do every day (even if “approximately”) without particularly strong evidence. (Note that quines exist and people are able to understand that brains are made of neurons.)
“Approximately” would be important. A lot of the discussions of decision theory seem to be trying to come up with something logically perfect, some theory which in principle could always give the best answer (though obviously no human would ever implement any theory perfectly). It thus seems relevant whether in principle perfection is possible. If it isn’t, then the evaluation of decision theories must somehow compare severity of flaws, rather than seeking flawlessness, and the discussions around here don’t generally seem to go that way..
That being said, I’m not sure I agree here anyway. It seems that people’s minds are sufficiently complicated and disunified that it is certainly possible for part of a person to model another part of the same person. I am not certain that self-modeling ever takes any other form; it is not obvious that it is ever possible for part of a person to successfully model that exact part.
Lots of interesting points, but on your final paragraph, is a theory that models the agent as part of its environment necessarily possible? Since the model is part of the agent, it would have to include the model as part of the model. I suppose that isn’t an outright contradiction, as there are of course mathematical structures with proper parts equivalent to the whole, but does it seem likely that plausible models human agents can construct could be like that?
It seems to me that there are logical constraints on self-knowledge, related to the well-known paradoxes associated with self-reference. I further think, though it would perhaps go too far afield for a mere comment to specify in detail, that quite a number of philosophical discussions of free will have touched on this issue (though usually none too clearly). Perhaps causal decision theory is partly motivated by people thinking that no version of evidential decision theory will be able to escape gaps in the evidence generated by the limitations on self-knowledge (I could believe this was part of the motivation of David Lewis). Note that this problem doesn’t require there to be anything spooky about human choices (if the problem is a restriction on self-knowledge, humans could still be perfectly determined, and any particular human could be perfectly predicted by someone other than themselves).
Pretty sure humans normally model themselves as part of the environment. Seems a bit excessive to conjecture the impossibility of something humans do every day (even if “approximately”) without particularly strong evidence. (Note that quines exist and people are able to understand that brains are made of neurons.)
“Approximately” would be important. A lot of the discussions of decision theory seem to be trying to come up with something logically perfect, some theory which in principle could always give the best answer (though obviously no human would ever implement any theory perfectly). It thus seems relevant whether in principle perfection is possible. If it isn’t, then the evaluation of decision theories must somehow compare severity of flaws, rather than seeking flawlessness, and the discussions around here don’t generally seem to go that way..
That being said, I’m not sure I agree here anyway. It seems that people’s minds are sufficiently complicated and disunified that it is certainly possible for part of a person to model another part of the same person. I am not certain that self-modeling ever takes any other form; it is not obvious that it is ever possible for part of a person to successfully model that exact part.
I’m a bit tired at the moment, but my more or less cached reply is “use a coarse-grained simulation of yourself.”
Who knows? I think this is a really interesting question and hopefully some of the work going on in MIRI workshops will be relevant to answering it.