Un-optimised vs anti-optimised
A putative new idea for AI control; index here.
This post contains no new insights; it just puts together some old insights in a format I hope is clearer.
Most satisficers are unoptimised (above the satisficing level): they have a limited drive to optimise and transform the universe. They may still end up optimising the universe anyway: they have no penalty for doing so (and sometimes it’s a good idea for them). But if they can lazily achieve their goal, then they’re ok with that too. So they simply have low optimisation pressure.
A safe “satisficer” design (or a reduced impact AI design) needs to be not only un-optimised, but specifically anti-optimised. It has to be setup so that “go out and optimise the universe” scores worse that “be lazy and achieve your goal”. The problem is that these terms are undefined (as usual), that there are many minor actions that can optimise the universe (such as creating a subagent), and the approach has to be safe against all possible ways of optimising the universe—not just the “maximise u” for a specific and known u.
That’s why the reduced impact/safe satisficer/anti-optimised designs are so hard: you have to add a very precise yet general (anti-)optimising pressure, rather than simply removing the current optimising pressure.
Would minimising the number of CPU cycles work as a lazy incentive.
This assumes that lesser CPU cycles will produce an outcome that is satisified rather than optimised, though in our current state of understanding any optimisation routines take a lot more computing effort than ‘rough enough’ solutions.
Perhaps getting the AGI’s to go Green will kill two birds with one stone.
This has problems with the creation of subagents: http://lesswrong.com/lw/lur/detecting_agents_and_subagents/
You can use a few CPU cycles to create subagents without that restriction.
It can be difficult to impossible to know how many CPU cycles a problem will take to solve before you solve it.