Yeah, I think I agree that example is a bit extreme, and it’s probably okay to assume we don’t have goals of that form.
That said, you often talk about AUP with examples like not breaking a vase. In reality, we could always simply buy a new vase. If you expect a low impact agent could beat us at games while still preserving our ability to beat it at games, do you also expect that a low impact agent could break a vase while preserving our ability to have an intact vase (by buying a new vase)?
Short answer: yes; if its goal is to break vases, that would be pretty reasonable.
Longer answer: The AUP theory of low impact says that impact is relative to the environment and to the agent’s vantage point therein. In Platonic gridworlds like this:
knowing whether a vase is present tells you a lot about the state, and you can’t replace the vase here, so breaking it is a big deal (according to AUP). If you could replace the vase, there would still be a lesser impact. AUP would say to avoid breaking unnecessary vases due to the slight penalty, since the goal presumably doesn’t require breaking the vase – so why not go around?
On the other hand, in the Go example, winning is the agent’s objective. Depending on how the agent models the world (as a real-world agent playing a game on a computer, or whether it thinks it’s just Platonically interacting with a Go environment), penalties get applied differently. In the former case, I don’t think it would incur much penalty for being good at a game (modulo approval incentives it may or may not predict). In the latter case, you’d probably need to keep giving it more impact allowance until it’s playing as well as you’d like. This is because the goal is related to the thing which has a bit of impact.
Yeah, I think I agree that example is a bit extreme, and it’s probably okay to assume we don’t have goals of that form.
That said, you often talk about AUP with examples like not breaking a vase. In reality, we could always simply buy a new vase. If you expect a low impact agent could beat us at games while still preserving our ability to beat it at games, do you also expect that a low impact agent could break a vase while preserving our ability to have an intact vase (by buying a new vase)?
Short answer: yes; if its goal is to break vases, that would be pretty reasonable.
Longer answer: The AUP theory of low impact says that impact is relative to the environment and to the agent’s vantage point therein. In Platonic gridworlds like this:
knowing whether a vase is present tells you a lot about the state, and you can’t replace the vase here, so breaking it is a big deal (according to AUP). If you could replace the vase, there would still be a lesser impact. AUP would say to avoid breaking unnecessary vases due to the slight penalty, since the goal presumably doesn’t require breaking the vase – so why not go around?
On the other hand, in the Go example, winning is the agent’s objective. Depending on how the agent models the world (as a real-world agent playing a game on a computer, or whether it thinks it’s just Platonically interacting with a Go environment), penalties get applied differently. In the former case, I don’t think it would incur much penalty for being good at a game (modulo approval incentives it may or may not predict). In the latter case, you’d probably need to keep giving it more impact allowance until it’s playing as well as you’d like. This is because the goal is related to the thing which has a bit of impact.