Pattern comments on The Catastrophic Convergence Conjecture

Pattern 14 Feb 2020 21:40 UTC
1 point
For example, a reward function for which inaction is the only optimal policy is “unaligned” and non-catastrophic.
Though if a system for preventing catastrophe (say, an asteroid impact prevention/mitigation system) had it’s reward system replaced with the inaction reward system, or was shutdown at a critical time, that replacement/shutdown could be a catastrophic act.