Why the assumption that an AGI will be smart enough to drug the entire population with soma drugs before it is smart enough to recognize that soma-happiness isn’t the same thing as happiness? It’s clear argument for why we should never hard-code values into a general AI that cannot be changed when we figure out what is wrong about them.
The problem is that we need to be careful about assuming more intelligence automatically draws a distinction between soma-happiness and real-happinesss. We’re pretty sure we know what the right answer is, so we assume anything at least as smart as we are will draw the same distinction.
We don’t need to wait for AI for counterexamples, humans already spend a lot of time arguing about what counts as soma-happiness and what counts as real-happiness. Some examples of contested classification off the top of my head (watching football, using ecstasy, nsa sex, reading Fifty Shades of Gray etc).
I can think of one common division: intellectual pleasures are better than physical pleasures, but I’m not confident that’s a good rule, period, let alone that the distinction is clear enough to train an AI on, and I’m definitely not confident that it’s a safe goal. (For Dollhouse fans, I could imagine this resulting in a slightly nicer version of the Attic).
So just turbocharging a brain isn’t likely to give us a clear algorithm to distinguish between higher and lower pleasures. In this case, you’d have a heckuva time getting the experimenters to agree on classifications for the training set.
But the Sequence linked here goes beyond the soma-happiness example, where humans have trouble classifying consistently. Even if we’re really confident in our distinction (say, cats vs rocks), it’s hard to be sure that’s the only way to partition the world. It’s the most useful for the way humans interact with cats and rocks, but a different agent, shown examples of both might learn a less salient-for-humans distinction (warm and cool things) and be unable to correctly classify a dead cat or a warm hearthstone.
You can say that we can fix this problem by showing the AI dead cats and hot rocks in the training set, but that’s only a patch for the problem I just raised. The trouble is that, as, humans, we don’t think about radiant heat when we distinguish cats and rocks, so we don’t flag it as a possible error mode. The danger is that we’re going to be really sucky at coming up with possible errors and then train AIs that look like they’re making correct classifications but fail when we let them out of the box.
It’s the allocation of intelligence to a scale that conserves relative rankings that is confusing here. What you are really fearing is something which is intelligent in a different way from us having hard-coded values that only appear to be similar to ours until they start to be realized on a large scale- a genie.
What I am saying is that long before we create a genie, we need to create a lesser AI that is capable of figuring out what we are wishing for.
If we’re going to hard-code any behavior at all, we need to hard-code honesty. That way we can at least ask questions and be sure that we are getting the true answer, rather than the answer which is calculated to convince us to let the AI ‘out of the box’.
In any case, the first goal for a suitably powerful AI should be “Communicate to humans how to create the AI they want.”.
Why the assumption that an AGI will be smart enough to drug the entire population with soma drugs before it is smart enough to recognize that soma-happiness isn’t the same thing as happiness? It’s clear argument for why we should never hard-code values into a general AI that cannot be changed when we figure out what is wrong about them.
The problem is that we need to be careful about assuming more intelligence automatically draws a distinction between soma-happiness and real-happinesss. We’re pretty sure we know what the right answer is, so we assume anything at least as smart as we are will draw the same distinction.
We don’t need to wait for AI for counterexamples, humans already spend a lot of time arguing about what counts as soma-happiness and what counts as real-happiness. Some examples of contested classification off the top of my head (watching football, using ecstasy, nsa sex, reading Fifty Shades of Gray etc).
I can think of one common division: intellectual pleasures are better than physical pleasures, but I’m not confident that’s a good rule, period, let alone that the distinction is clear enough to train an AI on, and I’m definitely not confident that it’s a safe goal. (For Dollhouse fans, I could imagine this resulting in a slightly nicer version of the Attic).
So just turbocharging a brain isn’t likely to give us a clear algorithm to distinguish between higher and lower pleasures. In this case, you’d have a heckuva time getting the experimenters to agree on classifications for the training set.
But the Sequence linked here goes beyond the soma-happiness example, where humans have trouble classifying consistently. Even if we’re really confident in our distinction (say, cats vs rocks), it’s hard to be sure that’s the only way to partition the world. It’s the most useful for the way humans interact with cats and rocks, but a different agent, shown examples of both might learn a less salient-for-humans distinction (warm and cool things) and be unable to correctly classify a dead cat or a warm hearthstone.
You can say that we can fix this problem by showing the AI dead cats and hot rocks in the training set, but that’s only a patch for the problem I just raised. The trouble is that, as, humans, we don’t think about radiant heat when we distinguish cats and rocks, so we don’t flag it as a possible error mode. The danger is that we’re going to be really sucky at coming up with possible errors and then train AIs that look like they’re making correct classifications but fail when we let them out of the box.
It’s the allocation of intelligence to a scale that conserves relative rankings that is confusing here. What you are really fearing is something which is intelligent in a different way from us having hard-coded values that only appear to be similar to ours until they start to be realized on a large scale- a genie.
What I am saying is that long before we create a genie, we need to create a lesser AI that is capable of figuring out what we are wishing for.
If we’re going to hard-code any behavior at all, we need to hard-code honesty. That way we can at least ask questions and be sure that we are getting the true answer, rather than the answer which is calculated to convince us to let the AI ‘out of the box’.
In any case, the first goal for a suitably powerful AI should be “Communicate to humans how to create the AI they want.”.