I’m in agreement with a lot of what you’re saying.
I agree that people’s “perceptions of value”, as it pertains to what influences them, are primarily unconscious.
I agree that “possession” can be a usefully accurate description, from the outside.
I agree that people can do “things which might lead to the removal of the hypercreature”, like meditation/therapy, and that not only will it sometimes remove that hypercreature but also that the person will sometimes be conditioned towards rather than away from repeating such things.
I agree that curiosity getting killed is an important part of their stability, that this means that they don’t update on information that’s available, and that this makes them dumb.
I agree that *sometimes* people can be “smarter than their hypercreature” in that they can be aware of and reason about things about which their hypercreatures cannot due to said dumbness.
I disagree about the mechanisms of these things. This leads me to prefer different framings, which make different predictions and suggest different actions.
I think I have about three distinct points.
1) When things work out nicely, hypercreatures don’t mount defenses, and the whole thing get conditioned towards rather than away from, it’s not so much “hypercreatures too dumb because they didn’t evolve to notice this threat”, it’s that you don’t give them the authority to stop you.
From the inside, it feels more like “I’m not willing to [just] give up X, because I strongly feel that it’s right, but I *am* willing to do process Y knowing that I will likely feel different afterwards. I know that my beliefs/priorities/attachments/etc will likely change, and in ways that I cannot predict, but I anticipate that these changes will be good and that I won’t lose anything not worth losing. And then when you go through the process and give up on having the entirety of X, it feels like “This is super interesting because I couldn’t see it coming, but this is *better* than X in every way according to every value X was serving for me”. It will not feel like “I must do this without thinking about it too much, so that I don’t awaken the hypercreatures!” and it will not feel like “Heck yeah! I freed myself from my ideological captor by pulling a fast one it couldnt see coming! I win you lose!”
Does your experience differ?
2) When those defenses *do* come out, it’s because people don’t trust the process which aims to rid them of hypercreatures more than they trust the hypercreatures
It may look super irrational when, say, Christians do all sorts of mental gymnastics when debating atheists. However, “regular people” do the same thing when debating flat earthers. A whole lot of people can’t actually figure things out on the object level and so they default to faith in society to have come to the correct consensus. This refusal to follow their own reasoning (as informed by their debate partner) when it conflicts with their faith in society is actually valid here, and leads to the correct conclusion. Similar things can hold when the Christian refuses to honestly look at the atheist arguments, knowing that they might find themselves losing their faith if they did. Maybe that faith is actually a good thing for them, or at least that losing the faith *in that way* would be bad for them. If you take a preacher’s religion from him, then what is he? From an inside perspective, it’s not so much that he’s “possessed” as it is his only way to protect his ability to keep a coherent and functioning life. It appears to be a much more mutually symbiotic relationship from the inside, even if it sometimes looks like a bad deal from the outside when you have access to a broader set of perspectives.
The prediction here is that if you keep the focus on helping the individual and are careful enough not to do anything that seems bad in expectation from the inside (e.g. prioritizing your own perspective on what’s “true” more than they subconsciously trust your perspective on truth to be beneficial to them), you can preempt any hypercreature defenses and not have to worry about whether it’s the kind of thing it could have evolved a defense against.
3) When people don’t have that trust in the process, hypercreatures will notice anything that the person notices, because the person is running on hypercreature logic.
When you trust your hypercreatures more than your own reasoning or the influence of those attempting to influence you, you *want* to protect them to the full extent of your abilities. To the extent that you notice “I might lose my hypercreature”, this is bad and will panic you because regardless of what you tell yourself and how happy you are about depending on such things, you actually want to keep it (for now, at least). This means that if your hypercreature is threatened by certain information, *you* are threatened by that information. So you refuse to update on it, and you as a whole person are now dumber for it.
Putting these together, reasoning purely in the abstract about FAI won’t save you by avoiding triggering any hypercreatures that have power over you. If they have power over you, it’s because rightly or wrongly, you (unconsciously) decided that it was in your best interest to give it to them, and you are using your whole brain to watch out for them. If you *can* act against their interests, it’s because you haven’t yet fully conceded yourself to them, and you don’t have to keep things abstract because you are able to recognize their problems and limitations, and keep them in place.
Thinking about FAI in the abstract can still help, if it helps you find a process that you trust more than your hypercreatures, but in that case too, you can follow that process yourself rather than waiting to build the AI and press “go”.
EDIT: and working on implementing that aligning process on yourself gives you hands on experience and allows you to test things on a smaller scale before committing to the whole thing. It’s like building a limited complexity scale model of a new helicopter type before committing to an 8 seater. To the extent that this perspective is right, trying to do it in the abstract only will make things much harder.
I’m in agreement with a lot of what you’re saying.
I agree that people’s “perceptions of value”, as it pertains to what influences them, are primarily unconscious.
I agree that “possession” can be a usefully accurate description, from the outside.
I agree that people can do “things which might lead to the removal of the hypercreature”, like meditation/therapy, and that not only will it sometimes remove that hypercreature but also that the person will sometimes be conditioned towards rather than away from repeating such things.
I agree that curiosity getting killed is an important part of their stability, that this means that they don’t update on information that’s available, and that this makes them dumb.
I agree that *sometimes* people can be “smarter than their hypercreature” in that they can be aware of and reason about things about which their hypercreatures cannot due to said dumbness.
I disagree about the mechanisms of these things. This leads me to prefer different framings, which make different predictions and suggest different actions.
I think I have about three distinct points.
1) When things work out nicely, hypercreatures don’t mount defenses, and the whole thing get conditioned towards rather than away from, it’s not so much “hypercreatures too dumb because they didn’t evolve to notice this threat”, it’s that you don’t give them the authority to stop you.
From the inside, it feels more like “I’m not willing to [just] give up X, because I strongly feel that it’s right, but I *am* willing to do process Y knowing that I will likely feel different afterwards. I know that my beliefs/priorities/attachments/etc will likely change, and in ways that I cannot predict, but I anticipate that these changes will be good and that I won’t lose anything not worth losing. And then when you go through the process and give up on having the entirety of X, it feels like “This is super interesting because I couldn’t see it coming, but this is *better* than X in every way according to every value X was serving for me”. It will not feel like “I must do this without thinking about it too much, so that I don’t awaken the hypercreatures!” and it will not feel like “Heck yeah! I freed myself from my ideological captor by pulling a fast one it couldnt see coming! I win you lose!”
Does your experience differ?
2) When those defenses *do* come out, it’s because people don’t trust the process which aims to rid them of hypercreatures more than they trust the hypercreatures
It may look super irrational when, say, Christians do all sorts of mental gymnastics when debating atheists. However, “regular people” do the same thing when debating flat earthers. A whole lot of people can’t actually figure things out on the object level and so they default to faith in society to have come to the correct consensus. This refusal to follow their own reasoning (as informed by their debate partner) when it conflicts with their faith in society is actually valid here, and leads to the correct conclusion. Similar things can hold when the Christian refuses to honestly look at the atheist arguments, knowing that they might find themselves losing their faith if they did. Maybe that faith is actually a good thing for them, or at least that losing the faith *in that way* would be bad for them. If you take a preacher’s religion from him, then what is he? From an inside perspective, it’s not so much that he’s “possessed” as it is his only way to protect his ability to keep a coherent and functioning life. It appears to be a much more mutually symbiotic relationship from the inside, even if it sometimes looks like a bad deal from the outside when you have access to a broader set of perspectives.
The prediction here is that if you keep the focus on helping the individual and are careful enough not to do anything that seems bad in expectation from the inside (e.g. prioritizing your own perspective on what’s “true” more than they subconsciously trust your perspective on truth to be beneficial to them), you can preempt any hypercreature defenses and not have to worry about whether it’s the kind of thing it could have evolved a defense against.
3) When people don’t have that trust in the process, hypercreatures will notice anything that the person notices, because the person is running on hypercreature logic.
When you trust your hypercreatures more than your own reasoning or the influence of those attempting to influence you, you *want* to protect them to the full extent of your abilities. To the extent that you notice “I might lose my hypercreature”, this is bad and will panic you because regardless of what you tell yourself and how happy you are about depending on such things, you actually want to keep it (for now, at least). This means that if your hypercreature is threatened by certain information, *you* are threatened by that information. So you refuse to update on it, and you as a whole person are now dumber for it.
Putting these together, reasoning purely in the abstract about FAI won’t save you by avoiding triggering any hypercreatures that have power over you. If they have power over you, it’s because rightly or wrongly, you (unconsciously) decided that it was in your best interest to give it to them, and you are using your whole brain to watch out for them. If you *can* act against their interests, it’s because you haven’t yet fully conceded yourself to them, and you don’t have to keep things abstract because you are able to recognize their problems and limitations, and keep them in place.
Thinking about FAI in the abstract can still help, if it helps you find a process that you trust more than your hypercreatures, but in that case too, you can follow that process yourself rather than waiting to build the AI and press “go”.
EDIT: and working on implementing that aligning process on yourself gives you hands on experience and allows you to test things on a smaller scale before committing to the whole thing. It’s like building a limited complexity scale model of a new helicopter type before committing to an 8 seater. To the extent that this perspective is right, trying to do it in the abstract only will make things much harder.