Isn’t this only S-risk in the weak sense of “there’s a lot of suffering”—not the strong sense of “literally maximize suffering”? E.g. it seems plausible to me mistakes like “not letting someone die if they’re suffering” still gives you a net positive universe.
Also, insofar as shard theory is a good description of humans, would you say random-human-god-emperor is an S-risk? and if so, with what probability?
I’m making the claim that there will be more suffering than eudomania.
You need to model the agent as having values other than ‘keep everyone alive’. Any resources being spent on making sure people don’t die would be diverted to those alternative ends. For example, suppose you got a super-intelligent shard-theoretic agent which 1) If anyone is about to die, it prevents them from dying, 2) if its able to make diamonds, it makes diamonds, and 3) If its able to get more power, it gets more power. It notices that it can definitely make more diamonds & get more power by chopping off the arms and legs of humans, and indeed this decreases the chance the humans die since they’re less likely to get into fights or jump off bridges! So it makes this plan, the plan goes through, and nobody has arms or legs anymore. No part of it makes decisions on the basis of whether or not the humans will end up happy with the new state of the world.
Moral of the story: The word will likely rapidly change when we get a superintelligence, and unless the superintelligence is sufficiently aligned to us, no decision will be made for reasons we care about, and because most ways the world could be conditional on humans existing are bad, we conclude net-bad world for many billions or trillions of years.
Also, insofar as shard theory is a good description of humans, would you say random-human-god-emperor is an S-risk? and if so, with what probability?
40% you take a random human, make them god, and you get an S-risk. Humans turn consistently awful when you give them a lot of power, and many humans are already pretty awful. I like to think in terms of CEV style stuff, and if you perform the CEV on a random human, I think maybe only 5% of the time do you get an S-risk.
Isn’t this only S-risk in the weak sense of “there’s a lot of suffering”—not the strong sense of “literally maximize suffering”? E.g. it seems plausible to me mistakes like “not letting someone die if they’re suffering” still gives you a net positive universe.
Also, insofar as shard theory is a good description of humans, would you say random-human-god-emperor is an S-risk? and if so, with what probability?
I’m making the claim that there will be more suffering than eudomania.
You need to model the agent as having values other than ‘keep everyone alive’. Any resources being spent on making sure people don’t die would be diverted to those alternative ends. For example, suppose you got a super-intelligent shard-theoretic agent which 1) If anyone is about to die, it prevents them from dying, 2) if its able to make diamonds, it makes diamonds, and 3) If its able to get more power, it gets more power. It notices that it can definitely make more diamonds & get more power by chopping off the arms and legs of humans, and indeed this decreases the chance the humans die since they’re less likely to get into fights or jump off bridges! So it makes this plan, the plan goes through, and nobody has arms or legs anymore. No part of it makes decisions on the basis of whether or not the humans will end up happy with the new state of the world.
Moral of the story: The word will likely rapidly change when we get a superintelligence, and unless the superintelligence is sufficiently aligned to us, no decision will be made for reasons we care about, and because most ways the world could be conditional on humans existing are bad, we conclude net-bad world for many billions or trillions of years.
40% you take a random human, make them god, and you get an S-risk. Humans turn consistently awful when you give them a lot of power, and many humans are already pretty awful. I like to think in terms of CEV style stuff, and if you perform the CEV on a random human, I think maybe only 5% of the time do you get an S-risk.