The main problem with making a satisficer, as I currently see it, is that I don’t actually know how to define utility functions in a way that’s amenable to well-defined satisficing. The goal is, in effect, to define goals in a way that limit the total resources the AI will consume in service to the goal, without having to formally define what a resource is. This seems to work pretty well with values defined in terms of the existence of things—a satisficer that just wants to make 100 paperclips and has access to a factory, will probably do so in an unsurprising way. But it doesn’t work so well with non-existence type values; a satisficer that wants to ensure there are no more than 100 cases of malaria, might turn all matter into space exploration probes to hunt down cases on other planets. A satisficer that wants to reduce the number of cases of malaria by at least 100 might work, but then your utility function is defined in terms of a tricky counterfactual. For example, if it’s “100 fewer cases than if I hadn’t been turned on”, then… what if it not having been turned on, would have lead to a minimizer being created instead? Then you’re back to converting all atoms into space probes.
The main problem with making a satisficer, as I currently see it, is that I don’t actually know how to define utility functions in a way that’s amenable to well-defined satisficing. The goal is, in effect, to define goals in a way that limit the total resources the AI will consume in service to the goal, without having to formally define what a resource is. This seems to work pretty well with values defined in terms of the existence of things—a satisficer that just wants to make 100 paperclips and has access to a factory, will probably do so in an unsurprising way. But it doesn’t work so well with non-existence type values; a satisficer that wants to ensure there are no more than 100 cases of malaria, might turn all matter into space exploration probes to hunt down cases on other planets. A satisficer that wants to reduce the number of cases of malaria by at least 100 might work, but then your utility function is defined in terms of a tricky counterfactual. For example, if it’s “100 fewer cases than if I hadn’t been turned on”, then… what if it not having been turned on, would have lead to a minimizer being created instead? Then you’re back to converting all atoms into space probes.