[...] without having to know all the future temperatures of the room, because I can cleanly describe the things I care about.[...] My goal with the thermostat example is just to point out that that isn’t (as far as I can see) because of a fundamental limit in how precisely you can predict the future.
I think there was a gap in my reasoning, let me put it this way: as you said, only when you can cleanly describe the things you care about you can design a system that doesn’t game your goals (thermostat). However, my reasoning suggests that one way in which you may not be able to cleanly describe the things you care about (predictive variables) is due to the inaccuracy attribution degeneracy that I mention in the post. In other words, you don’t (and possibly can’t) know if the variable you’re interested in predicting isn’t being accurately forecasted because a lack of relevant things to be specified (most common case) or due to misspecified initial conditions of all the relevant variables.
I claim that we would still have the same kinds of risks from advanced RL-based AI that we have now, because we don’t have a reliable way to clearly specify our complete preferences and have the AI correctly internalize them.
I partially agree: I’d say that, in that hypothetical case, you’ve solved one layer of complexity and this other one you’re mentioning still remains! I don’t claim that solving the issues raised by chaotic unpredictability solve goal gaming, but I do claim that without solving the former you cannot solve the latter (i.e., solving chaos is a necessary but not sufficient condition).
I think there was a gap in my reasoning, let me put it this way: as you said, only when you can cleanly describe the things you care about you can design a system that doesn’t game your goals (thermostat). However, my reasoning suggests that one way in which you may not be able to cleanly describe the things you care about (predictive variables) is due to the inaccuracy attribution degeneracy that I mention in the post. In other words, you don’t (and possibly can’t) know if the variable you’re interested in predicting isn’t being accurately forecasted because a lack of relevant things to be specified (most common case) or due to misspecified initial conditions of all the relevant variables.
I partially agree: I’d say that, in that hypothetical case, you’ve solved one layer of complexity and this other one you’re mentioning still remains! I don’t claim that solving the issues raised by chaotic unpredictability solve goal gaming, but I do claim that without solving the former you cannot solve the latter (i.e., solving chaos is a necessary but not sufficient condition).