I think hoping for “pseudokindness” doesn’t really work. You can have one-millionth care about a flower, but you’ll still pave it over if you have more-than-one-millionth desire for a parking lot there. And if we’re counting on AIs to have certain drives in tiny amounts, we shouldn’t just talk about kindness, but also for example desire for justice (leading to punishment and s-risk). So putting our hopes on these one-millionths feels really risky.
Pseudokindness is not quite kindness, it’s granting resources for some form of autonomous development with surviving boundaries. The hypothesis is that this is a naturally meaningful thing, not something that gets arbitrarily distorted by path-dependence of AI values, that is path-dependence mostly reduces its weight, but doesn’t change the target. Astronomical wealth then enables enclaves of philanthropically supported descendants of humanity, even if most AIs mostly don’t care.
The argument doesn’t say that there aren’t also hells, though on the hypothesis of naturality of pseudokindness that would be a concurrent thing, not an alternative. I don’t see as strong an argument for their naturality as that for pseudokindness, as this requires finding a place between not caring about humanity at all and the supposed attractor of caring about humanity correctly. The crux is whether that attractor is a real thing, possibly to a large degree due to the initial state of AIs as trained on humanity’s data.
I think hoping for “pseudokindness” doesn’t really work. You can have one-millionth care about a flower, but you’ll still pave it over if you have more-than-one-millionth desire for a parking lot there. And if we’re counting on AIs to have certain drives in tiny amounts, we shouldn’t just talk about kindness, but also for example desire for justice (leading to punishment and s-risk). So putting our hopes on these one-millionths feels really risky.
Pseudokindness is not quite kindness, it’s granting resources for some form of autonomous development with surviving boundaries. The hypothesis is that this is a naturally meaningful thing, not something that gets arbitrarily distorted by path-dependence of AI values, that is path-dependence mostly reduces its weight, but doesn’t change the target. Astronomical wealth then enables enclaves of philanthropically supported descendants of humanity, even if most AIs mostly don’t care.
The argument doesn’t say that there aren’t also hells, though on the hypothesis of naturality of pseudokindness that would be a concurrent thing, not an alternative. I don’t see as strong an argument for their naturality as that for pseudokindness, as this requires finding a place between not caring about humanity at all and the supposed attractor of caring about humanity correctly. The crux is whether that attractor is a real thing, possibly to a large degree due to the initial state of AIs as trained on humanity’s data.