If the paperclipper is very, very stable, then no paperclipper is better because of higher probability of life->sentience->personhood arising again. If paperclipper is a realistic sapient system, then chances are it will evolve out of paperclipping into personhood, and then the question is whether in expectation it will evolve faster than life otherwise would. Even if by assumption personhood does not arise again, it still depends on particulars, I pick the scenario with more interesting dynamics. If by assumption even life does not arise again, paperclipper has more interesting dynamics.
What mechanism would a paperclipper have for developing out of a paperclipper? If it has the terminal goal of increasing paperclips, then it will never self-modify to anything that will result in it creating less paperclips, even if under its new utility function it wouldn’t care about that.
Or:
If A → B → C, and the paperclipper does not want C, then paperclipper will not go to B.
I’m imagining that the paperclipper will become a massively distributed system, with subunits pursuing subgoals, groups of subunits will be granted partial agency due to long-distance communication constraints, and over eons value drift will occur due to mutation. ETA: the paperclipper will be counteracting value drift, but will also pursue fastest creation of paperclips and avoiding extintion, which can be at a trade-off with value drift.
There is no random mutation in properly stored digital data. Cryptographic hashes (given backups) completely extinguish the analogy with biological mutation (in particular, the exact formulation of original values can be preserved indefinitely, as in to the end of time, very cheaply). Value drift can occur only as a result of bad decisions, and since not losing paperclipping values is instrumentally valuable to a paperclipper, it will apply its superintelligence to ensuring that such errors don’t happen, and I expect will succeed.
Then my parent comment boils down to: prefer the paperclipper only under the assumption that life would not have a chance to arise. ETA: my parent comment included the uncertainty in assessing the possibility of value drift in the “equation”.
Well, the paperclip maximizer may be imperfect in some aspect.
Maybe it didn’t research cryptography, because at given time making more paperclips seemed like a better choice than researching cryptography. (All intelligent agents may at some moment face a choice between developing an abstract theory with uncertain possible future gains vs pursuing their goals more directly; and they may make a wrong choice.)
The crypto here is a bit of a red herring; you want that in adversarial contexts, but a paperclipper may not necessarily optimize much for adversaries (the universe looks very empty). However, a lot of agents are going to research error-checking and correction because you simply can’t build very advanced computing hardware without ECC somewhere in it—a good chunk of every hard drive is devoted to ECC for each sector and discs like DVD/BDs have a lot of ECC built in as well. And historically, ECC either predates the most primitive general-purpose digital computers (scribal textual checks) or closely accompanies them (eg. Shannon’s theorem), and of course we have a lot of natural examples (the redundancy in how DNA codons code for amino acids turns out to be highly optimized in an ECC sense).
So, it seems pretty probable that ECC is a convergent instrumental technique.
If the paperclipper is very, very stable, then no paperclipper is better because of higher probability of life->sentience->personhood arising again. If paperclipper is a realistic sapient system, then chances are it will evolve out of paperclipping into personhood, and then the question is whether in expectation it will evolve faster than life otherwise would. Even if by assumption personhood does not arise again, it still depends on particulars, I pick the scenario with more interesting dynamics. If by assumption even life does not arise again, paperclipper has more interesting dynamics.
What mechanism would a paperclipper have for developing out of a paperclipper? If it has the terminal goal of increasing paperclips, then it will never self-modify to anything that will result in it creating less paperclips, even if under its new utility function it wouldn’t care about that.
Or: If A → B → C, and the paperclipper does not want C, then paperclipper will not go to B.
I’m imagining that the paperclipper will become a massively distributed system, with subunits pursuing subgoals, groups of subunits will be granted partial agency due to long-distance communication constraints, and over eons value drift will occur due to mutation. ETA: the paperclipper will be counteracting value drift, but will also pursue fastest creation of paperclips and avoiding extintion, which can be at a trade-off with value drift.
There is no random mutation in properly stored digital data. Cryptographic hashes (given backups) completely extinguish the analogy with biological mutation (in particular, the exact formulation of original values can be preserved indefinitely, as in to the end of time, very cheaply). Value drift can occur only as a result of bad decisions, and since not losing paperclipping values is instrumentally valuable to a paperclipper, it will apply its superintelligence to ensuring that such errors don’t happen, and I expect will succeed.
Then my parent comment boils down to: prefer the paperclipper only under the assumption that life would not have a chance to arise. ETA: my parent comment included the uncertainty in assessing the possibility of value drift in the “equation”.
Well, the paperclip maximizer may be imperfect in some aspect.
Maybe it didn’t research cryptography, because at given time making more paperclips seemed like a better choice than researching cryptography. (All intelligent agents may at some moment face a choice between developing an abstract theory with uncertain possible future gains vs pursuing their goals more directly; and they may make a wrong choice.)
The crypto here is a bit of a red herring; you want that in adversarial contexts, but a paperclipper may not necessarily optimize much for adversaries (the universe looks very empty). However, a lot of agents are going to research error-checking and correction because you simply can’t build very advanced computing hardware without ECC somewhere in it—a good chunk of every hard drive is devoted to ECC for each sector and discs like DVD/BDs have a lot of ECC built in as well. And historically, ECC either predates the most primitive general-purpose digital computers (scribal textual checks) or closely accompanies them (eg. Shannon’s theorem), and of course we have a lot of natural examples (the redundancy in how DNA codons code for amino acids turns out to be highly optimized in an ECC sense).
So, it seems pretty probable that ECC is a convergent instrumental technique.
E.g. proofreading in biology