[realized this is basically just a behaviorist genie, but posting it in case someone finds it useful]
What makes something manipulative? If I do something with the intent of getting you to do something, is that manipulative? A simple request seems fine, but if I have a complete model of your mind, and use it phrase things so you do exactly what I want, that seems to have crossed an important line.
The idea is that using a model of a person that is *too* detailed is a violation of human values. In particular, it violates the value of autonomy, since your actions can now be controlled by someone using this model. And I believe that this is a significant part of what we are trying to protect when we invoke the colloquial value of privacy.
In ordinary situations, people can control how much privacy they have relative to another entity by limiting their contact with them to certain situations. But with an AGI, a person may lose a very large amount of privacy from seemingly innocuous interactions (we’re already seeing the start of this with “big data” companies improving their advertising effectiveness by using information that doesn’t seem that significant to us). Even worse, an AGI may be able to break the privacy of everyone (or a very large class of people) by using inferences based on just a few people (leveraging perhaps knowledge of the human connectome, hypnosis, etc...).
If we could reliably point to specific models an AI is using, and have it honestly share its model structure with us, we could potentially limit the strength of its model of human minds. Perhaps even have it use a hardcoded model limited to knowledge of the physical conditions required to keep it healthy. This would mitigate issues such as deliberate deception or mindcrime.
We could also potentially allow it to use more detailed models in specific cases, for example, we could let it use a detailed mind model to figure out what is causing depression in a specific case, but it would have to use the limited model in any other contexts or for any planning aspects of it. Not sure if that example would work, but I think that there are potentially safe ways to have it use context-limited mind models.
I question the claim that humans inherently need privacy from their loving gods. A lot of Christians seem happy enough without it, and I’ve heard most forager societies have a lot less privacy than ours, heck, most rural villages have a lot less privacy than most of us would be used to (because everyone knows you and talks about you).
The intensive, probably unnatural levels of privacy we’re used to in our nucleated families, our cities, our internet, might not really lead to a general increase in wellbeing overall, and seems implicated in many pathologies of isolation and coordination problems.
Yeah, I think if the village had truly deeply understood them they would not want to leave it. The problem is the part where they’re not really able to understand part.
It seems that privacy potentially could “tame” a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn’t be able to think of anything like that, and might resort to fulfilling the request in a more “normal” way. This doesn’t seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.
Part of it seems like a matter of alignment. It seems like there’s a difference between
Someone getting someone else to do something they wouldn’t normally do, especially under false pretenses (or as part of a deal and not keeping up the other side)
and
Someone choosing to go to an oracle AI (or doctor) and saying “How do I beat this addiction that’s ruining my life*?”
*There’s some scary stories about what people are willing to do to try to solve that problem, including brain surgery.
Yeah, I also see “manipulation” in the bad sense of the word as “making me do X without me knowing that I am pushed towards X”. (Or, in more coercive situations, with me knowing, disagreeing with the goal, but being unable to do anything about it.)
Teaching people, coaching them, curing their addictions, etc., as long as this is explicitly what they wanted (without any hidden extras), it is a “manipulation” in the technical sense of the word, but it is not evil.
Privacy as a component of AI alignment
[realized this is basically just a behaviorist genie, but posting it in case someone finds it useful]
What makes something manipulative? If I do something with the intent of getting you to do something, is that manipulative? A simple request seems fine, but if I have a complete model of your mind, and use it phrase things so you do exactly what I want, that seems to have crossed an important line.
The idea is that using a model of a person that is *too* detailed is a violation of human values. In particular, it violates the value of autonomy, since your actions can now be controlled by someone using this model. And I believe that this is a significant part of what we are trying to protect when we invoke the colloquial value of privacy.
In ordinary situations, people can control how much privacy they have relative to another entity by limiting their contact with them to certain situations. But with an AGI, a person may lose a very large amount of privacy from seemingly innocuous interactions (we’re already seeing the start of this with “big data” companies improving their advertising effectiveness by using information that doesn’t seem that significant to us). Even worse, an AGI may be able to break the privacy of everyone (or a very large class of people) by using inferences based on just a few people (leveraging perhaps knowledge of the human connectome, hypnosis, etc...).
If we could reliably point to specific models an AI is using, and have it honestly share its model structure with us, we could potentially limit the strength of its model of human minds. Perhaps even have it use a hardcoded model limited to knowledge of the physical conditions required to keep it healthy. This would mitigate issues such as deliberate deception or mindcrime.
We could also potentially allow it to use more detailed models in specific cases, for example, we could let it use a detailed mind model to figure out what is causing depression in a specific case, but it would have to use the limited model in any other contexts or for any planning aspects of it. Not sure if that example would work, but I think that there are potentially safe ways to have it use context-limited mind models.
I question the claim that humans inherently need privacy from their loving gods. A lot of Christians seem happy enough without it, and I’ve heard most forager societies have a lot less privacy than ours, heck, most rural villages have a lot less privacy than most of us would be used to (because everyone knows you and talks about you).
The intensive, probably unnatural levels of privacy we’re used to in our nucleated families, our cities, our internet, might not really lead to a general increase in wellbeing overall, and seems implicated in many pathologies of isolation and coordination problems.
A lot of people who have moved to cities from such places seem to mention this as exactly the reason why they wanted out.
That said, this is often because the others are judgmental etc., which wouldn’t need to be the case with an AGI.
(biased sample though?)
Yeah, I think if the village had truly deeply understood them they would not want to leave it. The problem is the part where they’re not really able to understand part.
It seems that privacy potentially could “tame” a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn’t be able to think of anything like that, and might resort to fulfilling the request in a more “normal” way. This doesn’t seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.
Part of it seems like a matter of alignment. It seems like there’s a difference between
Someone getting someone else to do something they wouldn’t normally do, especially under false pretenses (or as part of a deal and not keeping up the other side)
and
Someone choosing to go to an oracle AI (or doctor) and saying “How do I beat this addiction that’s ruining my life*?”
*There’s some scary stories about what people are willing to do to try to solve that problem, including brain surgery.
Yeah, I also see “manipulation” in the bad sense of the word as “making me do X without me knowing that I am pushed towards X”. (Or, in more coercive situations, with me knowing, disagreeing with the goal, but being unable to do anything about it.)
Teaching people, coaching them, curing their addictions, etc., as long as this is explicitly what they wanted (without any hidden extras), it is a “manipulation” in the technical sense of the word, but it is not evil.