Donald Hobson comments on Positive outcomes under an unaligned AGI takeover

Donald Hobson 12 May 2022 15:18 UTC
8 points
GPT-17 solves alignment in a human-understandable manner, and offers to provably incorporate human-aligned utility functions into its own code.
I don’t think an unaligned super-intelligence selecting for things that looks like an alignment solution to humans is likely to produce anything good. Especially when there is some weak selection pressure towards pulling a gotcha.
having backups reduces risk and can be useful in unexpected ways later, even if you feel certain your alternative is safe. Conservation of existing resources is a convergent instrumental goal, it turns out.
Lets suppose that at this stage, GPT-17 has nanotech. Why do you expect humans to be a better backup than some other thing it could build with the same resources. Also, if you include low probability events where humans save the superintelligence (very low probability), then you should include the similarly unlikely scenarios where humans somehow harm the superintelligence. Both are too unlikely to ever happen in reality.
- mukashi 13 May 2022 8:56 UTC
  2 points
  Parent
  Lets suppose that at this stage, GPT-17 has nanotech.
  There are many things that you can suppose. You can also assume that GP-17 has no nanotech. Creating nanotech might require the development of highly complex machinery under particular conditions, and it might very well be that those nanotech factories are not ready by the time that this AGI is created.
  Also, if you include low probability events where humans save the superintelligence (very low probability), then you should include the similarly unlikely scenarios where humans somehow harm the superintelligence
  The way I see this is that the OP just presented a small piece of fiction. Why is he/she not allowed to imagine an unlikely scenario? It is even stated in the opening paragraph! Why does he/she have to present an equally unlikely scenario of the opposite sign?
  I feel this reply is basically saying: I don’t like the text because it is not what I think is going to happen. But it is not really engaging with the scenario here presented, nor giving any real reasons or why the story presented here can’t happen this way.
  One more thing: this shows to me how biased towards doomerism LW is. I think that If the OP had published a similar story but showing a catastrophic scenario, it probably would have received a much better reception.