Thanks, Paul—I missed this response earlier, and I think you’ve pointed out some of the major disagreements here.
I agree that there’s something somewhat consequentialist going on during all kinds of complex computation. I’m skeptical that we need better decision theory to do this reliably—are there reasons or intuition-pumps you know of that have a bearing on this?
Different decision theories / priors / etc. are reflectively consistent, so you may want to make sure to choose the right ones the first time. (I think that the act-based approach basically avoids this.)
We have encountered some surprising possible failure modes, like blackmail by distant superintelligences, and might be concerned that we will run into new surprises if we don’t understand consequentialism well.
I guess there is one more:
If we want to understand what our agents are doing, we need to have a pretty good understanding of how effective decision-making ought to work. Otherwise algorithms whose consequentialism we understand will tend to be beaten out by algorithms whose consequentialism we don’t understand. This may make alignment way harder.
Thanks, Paul—I missed this response earlier, and I think you’ve pointed out some of the major disagreements here.
I agree that there’s something somewhat consequentialist going on during all kinds of complex computation. I’m skeptical that we need better decision theory to do this reliably—are there reasons or intuition-pumps you know of that have a bearing on this?
I mentioned two (which I don’t find persuasive):
Different decision theories / priors / etc. are reflectively consistent, so you may want to make sure to choose the right ones the first time. (I think that the act-based approach basically avoids this.)
We have encountered some surprising possible failure modes, like blackmail by distant superintelligences, and might be concerned that we will run into new surprises if we don’t understand consequentialism well.
I guess there is one more:
If we want to understand what our agents are doing, we need to have a pretty good understanding of how effective decision-making ought to work. Otherwise algorithms whose consequentialism we understand will tend to be beaten out by algorithms whose consequentialism we don’t understand. This may make alignment way harder.