I agree with most of what you wrote, but have questions about a couple of sections.
R has good third-person counterfactuals.
I’m not sure this problem is important or has a solution. Why do you think it’s an important problem to solve?
A should handle exotic environments. That is, A should do something sensible, even when placed in environments that might be considered pathological. For example, A should make appropriate use of halting oracles, time travel, infinite computational resources, and so on.
This desideratum doesn’t seem very important.
Why do you say it doesn’t seem very important? It’s plausible to me that when we really understand ontology and our values, it will be apparent that the most important “environments” for A to influence are the ones higher up on the arithmetical hierarchy (or other mathematical hierarchies). It seems plausible that those worlds might be much “bigger” or “more complex” or “more interesting” or “has more resources”, to such an extent that the utility we can get from them overwhelms the utility we can get from more mundane environments that we can also influence.
It’s also plausible that the above is false but A fails by acting as if it’s true. It seems to me that both of these failure modes are safety-relevant in the sense that errors could cost a large fraction of our potential utility, and probably won’t become apparent until A becomes very powerful.
Earlier in a related section you wrote “This seems like a problem we should be able to defer to future versions”. Did you mean future versions of decision theory that we research before building an AI, or future versions of AI that the initial AI self-improves into? In either case, why is this problem more deferrable than the other problems listed here?
I agree with most of what you wrote, but have questions about a couple of sections.
I’m not sure this problem is important or has a solution. Why do you think it’s an important problem to solve?
Why do you say it doesn’t seem very important? It’s plausible to me that when we really understand ontology and our values, it will be apparent that the most important “environments” for A to influence are the ones higher up on the arithmetical hierarchy (or other mathematical hierarchies). It seems plausible that those worlds might be much “bigger” or “more complex” or “more interesting” or “has more resources”, to such an extent that the utility we can get from them overwhelms the utility we can get from more mundane environments that we can also influence.
It’s also plausible that the above is false but A fails by acting as if it’s true. It seems to me that both of these failure modes are safety-relevant in the sense that errors could cost a large fraction of our potential utility, and probably won’t become apparent until A becomes very powerful.
Earlier in a related section you wrote “This seems like a problem we should be able to defer to future versions”. Did you mean future versions of decision theory that we research before building an AI, or future versions of AI that the initial AI self-improves into? In either case, why is this problem more deferrable than the other problems listed here?