For example, a reward function for which inaction is the only optimal policy is “unaligned” and non-catastrophic.
Though if a system for preventing catastrophe (say, an asteroid impact prevention/mitigation system) had it’s reward system replaced with the inaction reward system, or was shutdown at a critical time, that replacement/shutdown could be a catastrophic act.
Though if a system for preventing catastrophe (say, an asteroid impact prevention/mitigation system) had it’s reward system replaced with the inaction reward system, or was shutdown at a critical time, that replacement/shutdown could be a catastrophic act.