I’ve listed one algorithm for which daemons are obviously a problem, namely Solomonoff induction. Now I’m describing a very similar algorithm, and wondering if daemons are a problem. As far as I can tell, any learning algorithm is plausibly beset by daemons, so it seems natural to ask for a variant of learning that isn’t.
I’m not sure exactly how to characterize the problem other than by doing this kind of exercise. This post is already implicitly considering a particular design for AGI, I don’t see what we gain by being more specific here.
That’s fair. I guess my intuition is that the Solomonoff induction example could use more work as motivation. Sampling a cell at a particular frequency pretty fairly unrealistic to me. Realistically an AGI is going to be interested in more complex outcomes. So then there’s a connection to the idea of adversarial examples, where the consequentialists in the universal prior are trying to make the AGI think that something is going on when it isn’t actually going on. (Absent such deception, I’m not sure there is a problem. For example, if consequentialists reliably make their universe one in which everyone is truly having a good time for the purpose of possibly influencing a universal prior, then that will be true in our universe too, and we should take it into account for decisionmaking purposes.) But this is actually easier than typical adversarial examples, because an AGI also gets to observe the consequentialists plot their adversarial strategy and read their minds while they’re plotting. The AGI would have to be rather “dumb” in order to get tricked. If it’s simulating the universal prior in sufficiently high resolution to produce these weird effects, then by definition it’s able to see what is going on.
Humans already seem able to solve this problem: We simulate how others might think and react, and we don’t seem super worried about people we simulate internally breaking out of our simulation and hijacking our cognition. (Or at least, insofar as we do get anxious about e.g. putting ourselves in the shoes of people we dislike, this doesn’t have obvious relevance to an AGI—although again, perhaps this would be a good heuristic for brainstorming potential problems.) Anyway, my hunch is that this particular manifestation of the “daemon” problem will not require a lot of special effort once other AGI/FAI problems are solved.
This post is already implicitly considering a particular design for AGI, I don’t see what we gain by being more specific here.
Does your idea of neural nets + RL involve use of the universal prior? If not, I think I would try to understand if/how the daemon problem transfers to the neural nets + RL framework before trying to solve it. A solid description of a problem is the first step to finding a solution. The minimal version is a concrete example of how it could occur.
(Apologies if I am coming across as disagreeable—IMO, this is a mistake that FAI people make semi-frequently, and I would like for them to make it less often—you just got unlucky that I’m explaining myself in a comment on your post :P)
I’ve listed one algorithm for which daemons are obviously a problem, namely Solomonoff induction. Now I’m describing a very similar algorithm, and wondering if daemons are a problem. As far as I can tell, any learning algorithm is plausibly beset by daemons, so it seems natural to ask for a variant of learning that isn’t.
I’m not sure exactly how to characterize the problem other than by doing this kind of exercise. This post is already implicitly considering a particular design for AGI, I don’t see what we gain by being more specific here.
That’s fair. I guess my intuition is that the Solomonoff induction example could use more work as motivation. Sampling a cell at a particular frequency pretty fairly unrealistic to me. Realistically an AGI is going to be interested in more complex outcomes. So then there’s a connection to the idea of adversarial examples, where the consequentialists in the universal prior are trying to make the AGI think that something is going on when it isn’t actually going on. (Absent such deception, I’m not sure there is a problem. For example, if consequentialists reliably make their universe one in which everyone is truly having a good time for the purpose of possibly influencing a universal prior, then that will be true in our universe too, and we should take it into account for decisionmaking purposes.) But this is actually easier than typical adversarial examples, because an AGI also gets to observe the consequentialists plot their adversarial strategy and read their minds while they’re plotting. The AGI would have to be rather “dumb” in order to get tricked. If it’s simulating the universal prior in sufficiently high resolution to produce these weird effects, then by definition it’s able to see what is going on.
Humans already seem able to solve this problem: We simulate how others might think and react, and we don’t seem super worried about people we simulate internally breaking out of our simulation and hijacking our cognition. (Or at least, insofar as we do get anxious about e.g. putting ourselves in the shoes of people we dislike, this doesn’t have obvious relevance to an AGI—although again, perhaps this would be a good heuristic for brainstorming potential problems.) Anyway, my hunch is that this particular manifestation of the “daemon” problem will not require a lot of special effort once other AGI/FAI problems are solved.
Does your idea of neural nets + RL involve use of the universal prior? If not, I think I would try to understand if/how the daemon problem transfers to the neural nets + RL framework before trying to solve it. A solid description of a problem is the first step to finding a solution. The minimal version is a concrete example of how it could occur.
(Apologies if I am coming across as disagreeable—IMO, this is a mistake that FAI people make semi-frequently, and I would like for them to make it less often—you just got unlucky that I’m explaining myself in a comment on your post :P)