Under the strict criterion of meliorizing as written, it would make sense to swap to a program that promised to save 1,000 people, let all the others die, and make no further improvements, since this would still be better than not swapping. According to the line of argument in section 4 of Schmidhuber (2007), the agent ought to consider that it would be better to keep the previous program and wait for it to generate a better alternative.
Can’t you require that the agents you swap to spend at least some fraction of their effort on meliorizing? Each swap could lower that fraction, based on how much the expected value had increased (the closer we are to the goal, the less we need to search more) and how much effort had already been expended (if we’ve searched enough, we can be pretty sure that there’s not a better solution). More formally, you would want to spend meliorizing effort relative to the optimality gap you’re facing (or whatever crude approximation to it you have), and the cost of spending more effort relative to your current best plan (you might have another day you can spend looking, or it might be that if you don’t stop planning and start doing now, you lose everyone).
Can’t you require that the agents you swap to spend at least some fraction of their effort on meliorizing? Each swap could lower that fraction, based on how much the expected value had increased (the closer we are to the goal, the less we need to search more) and how much effort had already been expended (if we’ve searched enough, we can be pretty sure that there’s not a better solution). More formally, you would want to spend meliorizing effort relative to the optimality gap you’re facing (or whatever crude approximation to it you have), and the cost of spending more effort relative to your current best plan (you might have another day you can spend looking, or it might be that if you don’t stop planning and start doing now, you lose everyone).