I don’t (confidently) understand why the procrastination paradox indicates a problem to be solved. Could you clarify that for me, or point me to a clarification?
First off, it doesn’t seem like this kind of infinite buck-passing could happen in real life; is there a real-life (finite?) setting where this type of procrastination leads to bad actions? Second, it seems to me that similar paradoxes often come up in other situations where agents have infinite time horizons and can wait as long as they want—does the problem come from the infinity, or from something else?
The best explanation that I can give is “It’s immediately obvious to a human, even in an infinite situation, that the only way to get the button pressed is to press it immediately. Therefore, we haven’t captured human reasoning (about infinite situations), and we should capture that human reasoning in order to be confident about AI reasoning.” This is AFAICT the explanation Nate gives in the Vingean Reflection paper. Is that how you would express the problem?
It is definitely a problem with infinite buck-passing. It is probably possible to prove optimality if we have a continuous utility function (e.g. we’re using discounting). I think we might actually want a continuous utility function, but maybe not. Is there any time t such that you would consider it almost as good for a wonderful human civilization to exist for t steps and then die, compared to existing indefinitely?
The way I would express the procrastination paradox is something like:
There’s the tiling agents problem: we want AIs to construct successors that they trust to make correct decisions.
It would be desirable to have a system where an infinite sequence of AIs each trust the next one. If it worked, this would solve the tiling agents problem.
But, if we have something like this, then it will be unsound: it will prove that the button will eventually get pressed, even though it will never actually get pressed.
We can construct things that do press the button, but they don’t have the property of trusting successors that is desirable in some ways. Due to their handling of recursion, Paul’s logic and reflective oracles are both candidates for solving the tiling agents problem, however they both fail the procrastination paradox (when it’s set up this way).
Cool, thanks; sounds like I have about the same picture. One missing ingredient for me that was resolved by your answer, and by going back and looking at the papers again, was the distinction between consistency and soundness (on the natural numbers), which is not a distinction I think about often.
In case it’s useful, I’ll note that the procrastination paradox is hard for me to take seriously on an intuitive level, because some part of me thinks that requiring correct answers in infinite decision problems is unreasonable; so many reasoning systems fail on these problems, and infinite situations seem so unlikely, that they are hard for me to get worked up about. This isn’t so much a comment on how important the problem actually is, but more about how much argumentation may be required to convince people like me that they’re actually worth working on.
I don’t (confidently) understand why the procrastination paradox indicates a problem to be solved. Could you clarify that for me, or point me to a clarification?
First off, it doesn’t seem like this kind of infinite buck-passing could happen in real life; is there a real-life (finite?) setting where this type of procrastination leads to bad actions? Second, it seems to me that similar paradoxes often come up in other situations where agents have infinite time horizons and can wait as long as they want—does the problem come from the infinity, or from something else?
The best explanation that I can give is “It’s immediately obvious to a human, even in an infinite situation, that the only way to get the button pressed is to press it immediately. Therefore, we haven’t captured human reasoning (about infinite situations), and we should capture that human reasoning in order to be confident about AI reasoning.” This is AFAICT the explanation Nate gives in the Vingean Reflection paper. Is that how you would express the problem?
It is definitely a problem with infinite buck-passing. It is probably possible to prove optimality if we have a continuous utility function (e.g. we’re using discounting). I think we might actually want a continuous utility function, but maybe not. Is there any time t such that you would consider it almost as good for a wonderful human civilization to exist for t steps and then die, compared to existing indefinitely?
The way I would express the procrastination paradox is something like:
There’s the tiling agents problem: we want AIs to construct successors that they trust to make correct decisions.
It would be desirable to have a system where an infinite sequence of AIs each trust the next one. If it worked, this would solve the tiling agents problem.
But, if we have something like this, then it will be unsound: it will prove that the button will eventually get pressed, even though it will never actually get pressed.
We can construct things that do press the button, but they don’t have the property of trusting successors that is desirable in some ways. Due to their handling of recursion, Paul’s logic and reflective oracles are both candidates for solving the tiling agents problem, however they both fail the procrastination paradox (when it’s set up this way).
Cool, thanks; sounds like I have about the same picture. One missing ingredient for me that was resolved by your answer, and by going back and looking at the papers again, was the distinction between consistency and soundness (on the natural numbers), which is not a distinction I think about often.
In case it’s useful, I’ll note that the procrastination paradox is hard for me to take seriously on an intuitive level, because some part of me thinks that requiring correct answers in infinite decision problems is unreasonable; so many reasoning systems fail on these problems, and infinite situations seem so unlikely, that they are hard for me to get worked up about. This isn’t so much a comment on how important the problem actually is, but more about how much argumentation may be required to convince people like me that they’re actually worth working on.