Historically, a purposeful decision that permanently lifts the whole population out of poverty was never on the table. Overall indifference doesn’t prevent occasional philanthropy, but philanthropists were not that rich. So if there is some alignment (in the pseudokindness sense), the main issue is surviving until some group that cares gets rich enough. Which is not straightforward, since destruction of biosphere is a default side effect of post-human scaling of industry, and moderation in overall indifference about humanity is crucial at that step.
I think hoping for “pseudokindness” doesn’t really work. You can have one-millionth care about a flower, but you’ll still pave it over if you have more-than-one-millionth desire for a parking lot there. And if we’re counting on AIs to have certain drives in tiny amounts, we shouldn’t just talk about kindness, but also for example desire for justice (leading to punishment and s-risk). So putting our hopes on these one-millionths feels really risky.
Pseudokindness is not quite kindness, it’s granting resources for some form of autonomous development with surviving boundaries. The hypothesis is that this is a naturally meaningful thing, not something that gets arbitrarily distorted by path-dependence of AI values, that is path-dependence mostly reduces its weight, but doesn’t change the target. Astronomical wealth then enables enclaves of philanthropically supported descendants of humanity, even if most AIs mostly don’t care.
The argument doesn’t say that there aren’t also hells, though on the hypothesis of naturality of pseudokindness that would be a concurrent thing, not an alternative. I don’t see as strong an argument for their naturality as that for pseudokindness, as this requires finding a place between not caring about humanity at all and the supposed attractor of caring about humanity correctly. The crux is whether that attractor is a real thing, possibly to a large degree due to the initial state of AIs as trained on humanity’s data.
Historically, a purposeful decision that permanently lifts the whole population out of poverty was never on the table. Overall indifference doesn’t prevent occasional philanthropy, but philanthropists were not that rich. So if there is some alignment (in the pseudokindness sense), the main issue is surviving until some group that cares gets rich enough. Which is not straightforward, since destruction of biosphere is a default side effect of post-human scaling of industry, and moderation in overall indifference about humanity is crucial at that step.
I think hoping for “pseudokindness” doesn’t really work. You can have one-millionth care about a flower, but you’ll still pave it over if you have more-than-one-millionth desire for a parking lot there. And if we’re counting on AIs to have certain drives in tiny amounts, we shouldn’t just talk about kindness, but also for example desire for justice (leading to punishment and s-risk). So putting our hopes on these one-millionths feels really risky.
Pseudokindness is not quite kindness, it’s granting resources for some form of autonomous development with surviving boundaries. The hypothesis is that this is a naturally meaningful thing, not something that gets arbitrarily distorted by path-dependence of AI values, that is path-dependence mostly reduces its weight, but doesn’t change the target. Astronomical wealth then enables enclaves of philanthropically supported descendants of humanity, even if most AIs mostly don’t care.
The argument doesn’t say that there aren’t also hells, though on the hypothesis of naturality of pseudokindness that would be a concurrent thing, not an alternative. I don’t see as strong an argument for their naturality as that for pseudokindness, as this requires finding a place between not caring about humanity at all and the supposed attractor of caring about humanity correctly. The crux is whether that attractor is a real thing, possibly to a large degree due to the initial state of AIs as trained on humanity’s data.