Re embedded agency, and related problems like finding the right theory of counterfactuals:
I feel like these are just the kinds of philosophical questions that don’t ever get answered? (And are instead “dissolved” in the Wittgensteinian sense.) Consider, for instance, the Sorites paradox: well, that’s just how language works, man. Why’d you expect to have a solution for that? Why’d you expect every semantically meaningful question to have an answer adequate to the standards of science?
(A related perspective I’ve heard: “To tell an AI to produce a cancer cure and do nothing else, let’s delineate all consequences that are inherent, necessary, intended or common for any cancer cure” (which might be equivalent to solving counterfactuals). Again, by Wittgenstein’s intuitions this will be a fuzzy family resemblance type of thing, instead of there existing a socratic “simple essence” (simple definition) of the object/event.)
Maybe I just don’t understand the mathematical reality with which these issues seem to present themselves, with a missing slot for an answer (and some answers seeked by embedded agency do seem to not be at odds with the nature of physical reality). But on some level they just feel like “getting well-defined enough human concepts into the AI”, and such well-defined human concepts (given all at once, factual and complete, as contrasted to potentially encoded in human society) might not exist, similar to how a satisfying population ethics doesn’t exist, or maybe the tails come apart, etc.
Take as an example “defining counterfactuals correctly”. It feels like there’s not an ultimate say in the issue, just “whatever is most convenient for our reasoning, or for predicting correctly etc.”. And there might not be a definition as convenient as we expect there to be. Maybe there’s no mathematically robust definition of counterfactuals, and every conceivable definition fails in different corners of example space. That wouldn’t be so surprising. After all, reality doesn’t work that way. Maybe our apparent sense of “if X had been the case then Y would have happened” being intuitive, and correct, and useful is just a jumble of lived and hard-coded experience, and there’s no compact core for it other than “approximately the whole of human concept-space”.
The problem of counterfactuals is not just the problem of defining them.
The problem of counterfactuals exists for rationalists only: they are not considered a problem in mainstream philosophy.
The rationalist problem of countefactuals is eminently disolvable. You start making realistic assumptions about agents: that they have incomplete world-models, and imperfect self-knowledge.
Re Embedded Agency problems, my hope is that the successors to the physically universal computer/cellular automaton literatures has at least the correct framework to answer a lot of the questions asked, because they force some of the consequences of embeddedness there, especially the fact that under physical universality, you can’t have the machine be special, and in particular you’re not allowed to use the abstraction in which there are clear divisions between machine states and data states, with the former operating on the latter.
I agree about embedded agency. The way in which agents are traditionally defined in expected utility theory requires assumptions (e.g. logical omniscience and lack of physical side effects) that break down in embedded settings, and if you drop those assumptions you’re left with something that’s very different from classical agents and can’t be accurately modeled as one. Control theory is a much more natural framework for modeling reinforcement learner (or similar AI) behavior than expected utility theory.
Re embedded agency, and related problems like finding the right theory of counterfactuals:
I feel like these are just the kinds of philosophical questions that don’t ever get answered? (And are instead “dissolved” in the Wittgensteinian sense.) Consider, for instance, the Sorites paradox: well, that’s just how language works, man. Why’d you expect to have a solution for that? Why’d you expect every semantically meaningful question to have an answer adequate to the standards of science?
(A related perspective I’ve heard: “To tell an AI to produce a cancer cure and do nothing else, let’s delineate all consequences that are inherent, necessary, intended or common for any cancer cure” (which might be equivalent to solving counterfactuals). Again, by Wittgenstein’s intuitions this will be a fuzzy family resemblance type of thing, instead of there existing a socratic “simple essence” (simple definition) of the object/event.)
Maybe I just don’t understand the mathematical reality with which these issues seem to present themselves, with a missing slot for an answer (and some answers seeked by embedded agency do seem to not be at odds with the nature of physical reality). But on some level they just feel like “getting well-defined enough human concepts into the AI”, and such well-defined human concepts (given all at once, factual and complete, as contrasted to potentially encoded in human society) might not exist, similar to how a satisfying population ethics doesn’t exist, or maybe the tails come apart, etc.
Take as an example “defining counterfactuals correctly”. It feels like there’s not an ultimate say in the issue, just “whatever is most convenient for our reasoning, or for predicting correctly etc.”. And there might not be a definition as convenient as we expect there to be. Maybe there’s no mathematically robust definition of counterfactuals, and every conceivable definition fails in different corners of example space. That wouldn’t be so surprising. After all, reality doesn’t work that way. Maybe our apparent sense of “if X had been the case then Y would have happened” being intuitive, and correct, and useful is just a jumble of lived and hard-coded experience, and there’s no compact core for it other than “approximately the whole of human concept-space”.
The problem of counterfactuals is not just the problem of defining them.
The problem of counterfactuals exists for rationalists only: they are not considered a problem in mainstream philosophy.
The rationalist problem of countefactuals is eminently disolvable. You start making realistic assumptions about agents: that they have incomplete world-models, and imperfect self-knowledge.
https://www.lesswrong.com/posts/yBdDXXmLYejrcPPv2/two-alternatives-to-logical-counterfactuals
Re Embedded Agency problems, my hope is that the successors to the physically universal computer/cellular automaton literatures has at least the correct framework to answer a lot of the questions asked, because they force some of the consequences of embeddedness there, especially the fact that under physical universality, you can’t have the machine be special, and in particular you’re not allowed to use the abstraction in which there are clear divisions between machine states and data states, with the former operating on the latter.
I agree about embedded agency. The way in which agents are traditionally defined in expected utility theory requires assumptions (e.g. logical omniscience and lack of physical side effects) that break down in embedded settings, and if you drop those assumptions you’re left with something that’s very different from classical agents and can’t be accurately modeled as one. Control theory is a much more natural framework for modeling reinforcement learner (or similar AI) behavior than expected utility theory.