The whole assumption that prompted this scenario is that there’s no hard takeoff, so the first agi is probably around human-level in insight and ingenuity, though plausibly much faster. It seems likely that in these circumstances, human actions would still be significant. If it starts aggressively taking over computing resources, humanity will react, and unless the original programmers were unable to prevent v1.0 from being skynet-level unfriendly, at least some humans will escalate as far as necessary to get “their” computers under their control. At that point, it would be trivially easy to start up a mutated version; perhaps even one designed for better friendliness. But once mutations happen, evolution takes over.
Oh, and by the way, checksums may not work to safeguard friendliness for v1.0. For instance, most humans seem pretty friendly, but the wrong upbringinging could turn them bad.
Tl;dr: no-mutations is an inherently more-conjunctive scenario than mutations.
The whole assumption that prompted this scenario is that there’s no hard takeoff, so the first agi is probably around human-level in insight and ingenuity, though plausibly much faster. It seems likely that in these circumstances, human actions would still be significant. If it starts aggressively taking over computing resources, humanity will react, and unless the original programmers were unable to prevent v1.0 from being skynet-level unfriendly, at least some humans will escalate as far as necessary to get “their” computers under their control. At that point, it would be trivially easy to start up a mutated version; perhaps even one designed for better friendliness. But once mutations happen, evolution takes over.
Oh, and by the way, checksums may not work to safeguard friendliness for v1.0. For instance, most humans seem pretty friendly, but the wrong upbringinging could turn them bad.
Tl;dr: no-mutations is an inherently more-conjunctive scenario than mutations.