I’d be stoked if we created AIs that are the sort of thing that can make the difference between an empty gallery, and a gallery with someone in it to appreciate the art (where a person to enjoy the gallery makes all the difference). And I’d be absolutely thrilled if we could make AIs that care as we do, about sentience and people everywhere, however alien they may be, and about them achieving their weird alien desires.
That’s great! So, let’s assume that we are just trying to encode this as a value (taking into account interests of sentient beings and caring about their well-being and freedom + valuing having more and more elaborate and diverse sentiences and more elaborate and diverse fun subjective experiences).
No, we are not on track for that, I quite agree.
Still, these are not some ill-specified “human values”, and getting there does not require AI systems steerable to arbitrary goals, and does not require being able to make arbitrary values robust against “sharp left turns”.
Your parables are great. Nevertheless the goals and values we have just formulated seem to be natural and invariant, even though your parables demonstrate that they are not universal.
I strongly suspect that goals and values formulated like this can be made robust against “sharp left turns”.
Let’s
Try to find counter-examples to what I am saying here. E.g., assume we manage to encode this particular goal + to encode the idea to interact with available sentient beings and take what they say into account as a firm constraint. Can we create a plausible scenario of this particular goal or this particular constraint disappearing during an unfortunate “sharp left turn” while assuming that the AI entity or the community of entities doing self-improvement is somewhat competent?
That’s great! So, let’s assume that we are just trying to encode this as a value (taking into account interests of sentient beings and caring about their well-being and freedom + valuing having more and more elaborate and diverse sentiences and more elaborate and diverse fun subjective experiences).
No, we are not on track for that, I quite agree.
Still, these are not some ill-specified “human values”, and getting there does not require AI systems steerable to arbitrary goals, and does not require being able to make arbitrary values robust against “sharp left turns”.
Your parables are great. Nevertheless the goals and values we have just formulated seem to be natural and invariant, even though your parables demonstrate that they are not universal.
I strongly suspect that goals and values formulated like this can be made robust against “sharp left turns”.
Let’s
Try to find counter-examples to what I am saying here. E.g., assume we manage to encode this particular goal + to encode the idea to interact with available sentient beings and take what they say into account as a firm constraint. Can we create a plausible scenario of this particular goal or this particular constraint disappearing during an unfortunate “sharp left turn” while assuming that the AI entity or the community of entities doing self-improvement is somewhat competent?