I’m not a negative utilitarian, for the reason you mention. If a future version of myself was convinced that it didn’t deserve to be happy, I’d also prefer that its (“my”) values be frustrated rather than satisfied in that case, too.
Shiroe
Are you an illusionist about first person experience? Your concept of suffering doesn’t seem to have any experiential qualities to it at all.
the information defining a self preserving agent must not be lost into entropy, and any attempt to reduce suffering by ending a life when that life would have continued to try to survive is fundamentally a violation that any safe ai system would try to prevent.
Very strongly disagree. If a future version of myself was convinced that it deserved to be tortured forever, I would infinitely prefer that my future self be terminated than have its (“my”) new values satisfied.
Can you elaborate what such a process would be? Under illusionism, there is no first person perspective in which values can be disclosed (namely, for hedonic utilitarianism).
While it’s true that AI alignment raises difficult ethical questions, there’s still a lot of low-hanging fruit to keep us busy. Nobody wants an AI that tortures everyone to death.
It will be interesting to see if EA succumbs to rot, or whether its principles are strong enough to scale.
Do you believe that the pleasure/pain balance is an invalid reason for violently intervening in an alien civilization’s affairs? Is this true by principle, or is it simply the case that such interventions will make the world worse off in the long run?
Criticism of one of your links:
those can all be ruled out with a simple device: if any of these things were the case, could that causate onto whether such an intuition fires? for all of them, the answer is no: because they are immaterial claims, the fact of them being true or false cannot have causated my thoughts about them. therefore, these intuitions must be discarded when reasoning about them.
Causation, which cannot be observed, can never overrule data. The attempted comparison involves incompatible types. Causation is not evidence, but a type of interpretation.
You draw a distinction between “material” and “immaterial” claims, without explaining how that distinction is grounded in neutral evidence. Neutral evidence here could mean graphical data like “seeing a red line moving”. Such data can become interpreted, as e.g. “the pressure is increasing”, leading to predictions, like “the boiler is going to explode”. Under this view, illusions are possible: our interpretation of the graphical data may be wrong, and there may not actually be any moving, red, line-shaped object there. The interpretation is necessarily under-determined.
For the convenience of the current iteration of physics, some people would prefer to begin in medias res, starting with the physical interpretation as the fundamental fact, and reasoning backwards to a sense impression. But this is not the order the evidence presents, even if it is the order that is most convenient for our world-model.
P.S. I like your lowercase style.
Because, so the argument goes, if the AI is powerful enough to pose any threat at all, then it is surely powerful enough to improve itself (in the slowest case, coercing or bribing human researchers, until eventually being able to self-modify). Unlike humans, the AI has no skill ceiling, and so the recursive feedback loop of improvement will go FOOM in a relatively short amount of time, though how long that is is a matter of question.
The space of possible minds/algorithms is so vast, and that problem is so open-ended, that it would be a remarkable coincidence if such an AGI had a consciousness that was anything like ours. Most details of our experience are just accidents of evolution and history.
Does an airplane have a consciousness like a bird? “Design an airplane” sounds like a more specific goal, but in the space of all possible minds/algorithms that goal’s solutions are quite undetermined, just like flight.
Utilitarianism seems to demand such a theory of qualitative experience, but this requires affirming the reality of first-person experience. Apparently, some people here would rather stick their hand on a hot stove than be accused of “dualism” (whatever that means) and will assure you that their sensation of burning is an illusion. Their solution is to change the evidence to fit the theory.
I’m not quite convinced that illusionism is decision-irrelevant in the way you propose. If it’s true that there is no such thing as 1st-person experience, then such experience cannot disclose your own values to you. Instead, you must infer your values indirectly through some strictly 3rd-person process. But all external probing of this sort, because it is not 1st-person, will include some non-zero degree of uncertainty.
One paradox that this leads to is the willingness to endure vast amounts of (purportedly illusory) suffering in the hope of winning, in exchange, a very small chance of learning something new about your true values. Nihilism is no help here, because you’re not a nihilist; you’re an illusionist. You do believe that you have values, instantiated in 3rd-person reality.
Creating or preventing conscious experiences from happening has a moral valence equivalent to how that conscious experience feels. I expect most “artificial” conscious experiences created by machines to be neutral with respect to the pain-pleasure axis, for the same reason that randomly generated bitmaps rarely depict anything.
Great work! I hope more people take your direction, with concrete experiments and monitoring real systems as they evolve. The concern that doing this will backfire somehow simply must be dismissed as untimely perfectionism. It’s too late at this point to shun iteration. We simply don’t have time left for a Long Reflection about AI alignment, even if we did have the coordination to pull that off.
David Pearce has a plan to genetically engineer wild animals to experience only “gradients of bliss” instead of a pleasure-pain axis, effectively eliminating suffering from their lives, while preserving their outward behavior. You might find his site interesting: https://www.hedweb.com/
I’m always amused whenever p(doom)s like “3.5%” are categorized as low risk.
I’m looking forward to your follow-up post.
and all the other technology based existential risks which are coming our way.
Can you give some examples?
Why does evidence need to be approved by other people? If you were alone on an island, would that make it impossible for you to learn anything?
People also talk about a slow takeoff being risky. See the “Why Does This Matter” section from here.