See my edit above. “would use everything it has to...” is the kind of behaviour we want to avoid. So I’m more following the sastisficing intuition than the formal definition. I can justify this by going meta: when people design/imagine satisficers, they generally look around at the problem, see what can be achieved, how hard it is, etc… and then set the threshold. I want to automate “set a reasonable threshold” as well as “be a reasonable satisficer” in order to achieve “don’t have a huge impact on the world”.
See my edit above. “would use everything it has to...” is the kind of behaviour we want to avoid. So I’m more following the sastisficing intuition than the formal definition. I can justify this by going meta: when people design/imagine satisficers, they generally look around at the problem, see what can be achieved, how hard it is, etc… and then set the threshold. I want to automate “set a reasonable threshold” as well as “be a reasonable satisficer” in order to achieve “don’t have a huge impact on the world”.