As long as the user and AI appreciate the arguments we are making right now, then we shouldn’t expect it to do worse than stealing the unaligned AI’s strategy. There is all the usual ambiguity about “what the user wants,” but if the user expects that the resources other agents are gathering will be more useful than the resources its AI is gathering, then its AI would clearly do better (in the user’s view) by doing what others are doing.
There could easily be an abstract argument that other agents are gathering more useful resources, but still no way (or no corrigible way) to “do better by doing what others are doing”. For example suppose I’m playing chess with a superhuman AI. I know the other agent is gathering more useful resources (e.g., taking up better board positions) but there’s nothing I can do about it except to turn over all of my decisions to my own AI that optimizes directly for winning the game (rather than for any instrumental or short-term preferences I might have for how to win the game).
I think I won’t have time to engage much on this in the near future
Ok, I tried to summarize my current thoughts on this topic as clearly as I can here, so you’ll have something concise and coherent to respond to when you get back to this.
There could easily be an abstract argument that other agents are gathering more useful resources, but still no way (or no corrigible way) to “do better by doing what others are doing”. For example suppose I’m playing chess with a superhuman AI. I know the other agent is gathering more useful resources (e.g., taking up better board positions) but there’s nothing I can do about it except to turn over all of my decisions to my own AI that optimizes directly for winning the game (rather than for any instrumental or short-term preferences I might have for how to win the game).
Ok, I tried to summarize my current thoughts on this topic as clearly as I can here, so you’ll have something concise and coherent to respond to when you get back to this.