Would not effectively resist M(-u), a u-minimizer.
I’m not sure how that’s supposed to work. S(u) won’t do much as long as the desirability threshold is obtained, but if M(-u) comes along and makes this difficult, S(u) would use everything it has to stop M(-u). Are you using something beyond desirability threshold? Something where S(u) stops not when the solution is good enough, but when it gets difficult to improve?
See my edit above. “would use everything it has to...” is the kind of behaviour we want to avoid. So I’m more following the sastisficing intuition than the formal definition. I can justify this by going meta: when people design/imagine satisficers, they generally look around at the problem, see what can be achieved, how hard it is, etc… and then set the threshold. I want to automate “set a reasonable threshold” as well as “be a reasonable satisficer” in order to achieve “don’t have a huge impact on the world”.
I’m not sure how that’s supposed to work. S(u) won’t do much as long as the desirability threshold is obtained, but if M(-u) comes along and makes this difficult, S(u) would use everything it has to stop M(-u). Are you using something beyond desirability threshold? Something where S(u) stops not when the solution is good enough, but when it gets difficult to improve?
See my edit above. “would use everything it has to...” is the kind of behaviour we want to avoid. So I’m more following the sastisficing intuition than the formal definition. I can justify this by going meta: when people design/imagine satisficers, they generally look around at the problem, see what can be achieved, how hard it is, etc… and then set the threshold. I want to automate “set a reasonable threshold” as well as “be a reasonable satisficer” in order to achieve “don’t have a huge impact on the world”.