I’m pretty pessimistic on the chances of that one. You’re banking on what Val is describing as “superintelligences” being dumber than you are, despite the fact that it has recruited your brain to work for its goals. You’re clearly smart enough to make the connection between “If I design an AI this way, I might not get what my hypercreature wants”, since you just stated it. That means you’re smart enough to anticipate it happening, and that’s going to activate any defenses you have.
If this was true, then any attempt to improve your rationality or reduce the impact of hypercreatures on your mind would be doomed, since they would realize what you’re doing and prevent you from doing it.
In my model, “hypercreatures” are something like self-replicating emotional strategies for meeting specific needs, that undergo selection to evolve something like defensive strategies as they emerge. I believe Val’s model of them is similar because I got much of it from him. :)
But there’s a sense in which the emotional strategies have to be dumber than the entire person. The continued existence of the strategies requires systematically failing to notice information that’s often already present in other parts of the person’s brain and which would contradict the underlying assumptions of the strategies (Val talks a bit about how hypercreatures rely on systematically cutting off curiosity, at 3:34 − 9:22 of this video).
And people already do the equivalent of “doing a thing which might lead to the removal of the hypercreature”. For instance, someone may do meditation/therapy on an emotional issue, heal an emotional wound which happens to also have been the need fueling the hypercreature, and then find themselves being unexpectedly more calm and open-minded around political discussions that were previously mind-killing to them. And rather than this being something that causes the hypercreatures in their mind to make them avoid any therapy in the future, they might find this a very positive thing that encourages them to do even more therapy/meditation in the hopes of (among other things) feeling even calmer in future political discussions. (Speaking from personal experience here.)
This “enslavement to hypercreatures” typically happens because the person it “taken over” perceives it to have value.
I agree, in part. Hypercreatures are instantiated as emotional strategies that fulfill some kind of a need. Though “the person perceives it to have value” suggests that it’s a conscious evaluation, whereas my model is that the evaluation is a subconscious one. Which makes something like “possession” a somewhat apt (even if imperfect) description, given that the person isn’t consciously aware of the real causes of why they act or believe the way they do, and may often be quite mistaken about them.
I’m in agreement with a lot of what you’re saying.
I agree that people’s “perceptions of value”, as it pertains to what influences them, are primarily unconscious.
I agree that “possession” can be a usefully accurate description, from the outside.
I agree that people can do “things which might lead to the removal of the hypercreature”, like meditation/therapy, and that not only will it sometimes remove that hypercreature but also that the person will sometimes be conditioned towards rather than away from repeating such things.
I agree that curiosity getting killed is an important part of their stability, that this means that they don’t update on information that’s available, and that this makes them dumb.
I agree that *sometimes* people can be “smarter than their hypercreature” in that they can be aware of and reason about things about which their hypercreatures cannot due to said dumbness.
I disagree about the mechanisms of these things. This leads me to prefer different framings, which make different predictions and suggest different actions.
I think I have about three distinct points.
1) When things work out nicely, hypercreatures don’t mount defenses, and the whole thing get conditioned towards rather than away from, it’s not so much “hypercreatures too dumb because they didn’t evolve to notice this threat”, it’s that you don’t give them the authority to stop you.
From the inside, it feels more like “I’m not willing to [just] give up X, because I strongly feel that it’s right, but I *am* willing to do process Y knowing that I will likely feel different afterwards. I know that my beliefs/priorities/attachments/etc will likely change, and in ways that I cannot predict, but I anticipate that these changes will be good and that I won’t lose anything not worth losing. And then when you go through the process and give up on having the entirety of X, it feels like “This is super interesting because I couldn’t see it coming, but this is *better* than X in every way according to every value X was serving for me”. It will not feel like “I must do this without thinking about it too much, so that I don’t awaken the hypercreatures!” and it will not feel like “Heck yeah! I freed myself from my ideological captor by pulling a fast one it couldnt see coming! I win you lose!”
Does your experience differ?
2) When those defenses *do* come out, it’s because people don’t trust the process which aims to rid them of hypercreatures more than they trust the hypercreatures
It may look super irrational when, say, Christians do all sorts of mental gymnastics when debating atheists. However, “regular people” do the same thing when debating flat earthers. A whole lot of people can’t actually figure things out on the object level and so they default to faith in society to have come to the correct consensus. This refusal to follow their own reasoning (as informed by their debate partner) when it conflicts with their faith in society is actually valid here, and leads to the correct conclusion. Similar things can hold when the Christian refuses to honestly look at the atheist arguments, knowing that they might find themselves losing their faith if they did. Maybe that faith is actually a good thing for them, or at least that losing the faith *in that way* would be bad for them. If you take a preacher’s religion from him, then what is he? From an inside perspective, it’s not so much that he’s “possessed” as it is his only way to protect his ability to keep a coherent and functioning life. It appears to be a much more mutually symbiotic relationship from the inside, even if it sometimes looks like a bad deal from the outside when you have access to a broader set of perspectives.
The prediction here is that if you keep the focus on helping the individual and are careful enough not to do anything that seems bad in expectation from the inside (e.g. prioritizing your own perspective on what’s “true” more than they subconsciously trust your perspective on truth to be beneficial to them), you can preempt any hypercreature defenses and not have to worry about whether it’s the kind of thing it could have evolved a defense against.
3) When people don’t have that trust in the process, hypercreatures will notice anything that the person notices, because the person is running on hypercreature logic.
When you trust your hypercreatures more than your own reasoning or the influence of those attempting to influence you, you *want* to protect them to the full extent of your abilities. To the extent that you notice “I might lose my hypercreature”, this is bad and will panic you because regardless of what you tell yourself and how happy you are about depending on such things, you actually want to keep it (for now, at least). This means that if your hypercreature is threatened by certain information, *you* are threatened by that information. So you refuse to update on it, and you as a whole person are now dumber for it.
Putting these together, reasoning purely in the abstract about FAI won’t save you by avoiding triggering any hypercreatures that have power over you. If they have power over you, it’s because rightly or wrongly, you (unconsciously) decided that it was in your best interest to give it to them, and you are using your whole brain to watch out for them. If you *can* act against their interests, it’s because you haven’t yet fully conceded yourself to them, and you don’t have to keep things abstract because you are able to recognize their problems and limitations, and keep them in place.
Thinking about FAI in the abstract can still help, if it helps you find a process that you trust more than your hypercreatures, but in that case too, you can follow that process yourself rather than waiting to build the AI and press “go”.
EDIT: and working on implementing that aligning process on yourself gives you hands on experience and allows you to test things on a smaller scale before committing to the whole thing. It’s like building a limited complexity scale model of a new helicopter type before committing to an 8 seater. To the extent that this perspective is right, trying to do it in the abstract only will make things much harder.
If this was true, then any attempt to improve your rationality or reduce the impact of hypercreatures on your mind would be doomed, since they would realize what you’re doing and prevent you from doing it.
In my model, “hypercreatures” are something like self-replicating emotional strategies for meeting specific needs, that undergo selection to evolve something like defensive strategies as they emerge. I believe Val’s model of them is similar because I got much of it from him. :)
But there’s a sense in which the emotional strategies have to be dumber than the entire person. The continued existence of the strategies requires systematically failing to notice information that’s often already present in other parts of the person’s brain and which would contradict the underlying assumptions of the strategies (Val talks a bit about how hypercreatures rely on systematically cutting off curiosity, at 3:34 − 9:22 of this video).
And people already do the equivalent of “doing a thing which might lead to the removal of the hypercreature”. For instance, someone may do meditation/therapy on an emotional issue, heal an emotional wound which happens to also have been the need fueling the hypercreature, and then find themselves being unexpectedly more calm and open-minded around political discussions that were previously mind-killing to them. And rather than this being something that causes the hypercreatures in their mind to make them avoid any therapy in the future, they might find this a very positive thing that encourages them to do even more therapy/meditation in the hopes of (among other things) feeling even calmer in future political discussions. (Speaking from personal experience here.)
I agree, in part. Hypercreatures are instantiated as emotional strategies that fulfill some kind of a need. Though “the person perceives it to have value” suggests that it’s a conscious evaluation, whereas my model is that the evaluation is a subconscious one. Which makes something like “possession” a somewhat apt (even if imperfect) description, given that the person isn’t consciously aware of the real causes of why they act or believe the way they do, and may often be quite mistaken about them.
I’m in agreement with a lot of what you’re saying.
I agree that people’s “perceptions of value”, as it pertains to what influences them, are primarily unconscious.
I agree that “possession” can be a usefully accurate description, from the outside.
I agree that people can do “things which might lead to the removal of the hypercreature”, like meditation/therapy, and that not only will it sometimes remove that hypercreature but also that the person will sometimes be conditioned towards rather than away from repeating such things.
I agree that curiosity getting killed is an important part of their stability, that this means that they don’t update on information that’s available, and that this makes them dumb.
I agree that *sometimes* people can be “smarter than their hypercreature” in that they can be aware of and reason about things about which their hypercreatures cannot due to said dumbness.
I disagree about the mechanisms of these things. This leads me to prefer different framings, which make different predictions and suggest different actions.
I think I have about three distinct points.
1) When things work out nicely, hypercreatures don’t mount defenses, and the whole thing get conditioned towards rather than away from, it’s not so much “hypercreatures too dumb because they didn’t evolve to notice this threat”, it’s that you don’t give them the authority to stop you.
From the inside, it feels more like “I’m not willing to [just] give up X, because I strongly feel that it’s right, but I *am* willing to do process Y knowing that I will likely feel different afterwards. I know that my beliefs/priorities/attachments/etc will likely change, and in ways that I cannot predict, but I anticipate that these changes will be good and that I won’t lose anything not worth losing. And then when you go through the process and give up on having the entirety of X, it feels like “This is super interesting because I couldn’t see it coming, but this is *better* than X in every way according to every value X was serving for me”. It will not feel like “I must do this without thinking about it too much, so that I don’t awaken the hypercreatures!” and it will not feel like “Heck yeah! I freed myself from my ideological captor by pulling a fast one it couldnt see coming! I win you lose!”
Does your experience differ?
2) When those defenses *do* come out, it’s because people don’t trust the process which aims to rid them of hypercreatures more than they trust the hypercreatures
It may look super irrational when, say, Christians do all sorts of mental gymnastics when debating atheists. However, “regular people” do the same thing when debating flat earthers. A whole lot of people can’t actually figure things out on the object level and so they default to faith in society to have come to the correct consensus. This refusal to follow their own reasoning (as informed by their debate partner) when it conflicts with their faith in society is actually valid here, and leads to the correct conclusion. Similar things can hold when the Christian refuses to honestly look at the atheist arguments, knowing that they might find themselves losing their faith if they did. Maybe that faith is actually a good thing for them, or at least that losing the faith *in that way* would be bad for them. If you take a preacher’s religion from him, then what is he? From an inside perspective, it’s not so much that he’s “possessed” as it is his only way to protect his ability to keep a coherent and functioning life. It appears to be a much more mutually symbiotic relationship from the inside, even if it sometimes looks like a bad deal from the outside when you have access to a broader set of perspectives.
The prediction here is that if you keep the focus on helping the individual and are careful enough not to do anything that seems bad in expectation from the inside (e.g. prioritizing your own perspective on what’s “true” more than they subconsciously trust your perspective on truth to be beneficial to them), you can preempt any hypercreature defenses and not have to worry about whether it’s the kind of thing it could have evolved a defense against.
3) When people don’t have that trust in the process, hypercreatures will notice anything that the person notices, because the person is running on hypercreature logic.
When you trust your hypercreatures more than your own reasoning or the influence of those attempting to influence you, you *want* to protect them to the full extent of your abilities. To the extent that you notice “I might lose my hypercreature”, this is bad and will panic you because regardless of what you tell yourself and how happy you are about depending on such things, you actually want to keep it (for now, at least). This means that if your hypercreature is threatened by certain information, *you* are threatened by that information. So you refuse to update on it, and you as a whole person are now dumber for it.
Putting these together, reasoning purely in the abstract about FAI won’t save you by avoiding triggering any hypercreatures that have power over you. If they have power over you, it’s because rightly or wrongly, you (unconsciously) decided that it was in your best interest to give it to them, and you are using your whole brain to watch out for them. If you *can* act against their interests, it’s because you haven’t yet fully conceded yourself to them, and you don’t have to keep things abstract because you are able to recognize their problems and limitations, and keep them in place.
Thinking about FAI in the abstract can still help, if it helps you find a process that you trust more than your hypercreatures, but in that case too, you can follow that process yourself rather than waiting to build the AI and press “go”.
EDIT: and working on implementing that aligning process on yourself gives you hands on experience and allows you to test things on a smaller scale before committing to the whole thing. It’s like building a limited complexity scale model of a new helicopter type before committing to an 8 seater. To the extent that this perspective is right, trying to do it in the abstract only will make things much harder.