I don’t get it. Human is obviously (in that regard) an agent reasoning about his actions. Human also will choose 10 without any difficulty. What in human decision making process is not formalizable here? Assuming we agree that 10 is rational choice.
Abram Demski:
Suppose you know that you take the $10. How do you reason about what would happen if you took the $5 instead? It seems easy if you know how to separate yourself from the world, so that you only think of external consequences (getting $5). If you think about yourself as well, then you run into contradictions when you try to imagine the world where you take the $5, because you know it is not the sort of thing you would do. Maybe you have some absurd predictions about what the world would be like if you took the $5; for example, you imagine that you would have to be blind. That’s alright, though, because in the end you are taking the $10, so you’re doing fine.
Part of the point is that an agent can be in a similar position, except it is taking the $5, knows it is taking the $5, and unable to figure out that it should be taking the $10 instead due to the absurd predictions it makes about what happens when it takes the $10. It seems kind of hard for a human to end up in that situation, but it doesn’t seem so hard to get this sort of thing when we write down formal reasoners, particularly when we let them reason about themselves fully (as natural parts of the world) rather than only reasoning about the external world or having pre-programmed divisions (so they reason about themselves in a different way from how they reason about the world).
Sure, one can imagine hypothetically taking $5, even if in reality they would take $10. That’s a spurious output from a different algorithm altogether. it assumes the world where you are not the same person who takes $10. So, it would make sense to examine which of the two you are, if you don’t yet know that you will take $10, but not if you already know it. Which of the two is it?
Cross-posting some comments from the MIRI Blog:
Konstantin Surkov:
Abram Demski:
Sure, one can imagine hypothetically taking $5, even if in reality they would take $10. That’s a spurious output from a different algorithm altogether. it assumes the world where you are not the same person who takes $10. So, it would make sense to examine which of the two you are, if you don’t yet know that you will take $10, but not if you already know it. Which of the two is it?