I don’t understand why you call this a problem. If I understand you correctly, you are proposing that people constantly and strongly optimize to obtain signalling advantages. They do so without becoming directly aware of it, which further increases their efficiency. So we have a situation where people want something and choose an efficient way to get it. Isn’t that good?
More directly, I’m confused how you can look at an organism, see that it uses its optimization power in a goal-oriented and efficient way (status gains in this case) and call that problematic, merely because some of these organisms disagree that this is their actual goal. What would you want them to do—be honest and thus handicap their status seeking?
Say you play many games of Diplomacy) against an AI, and the AI often promised you to be loyal, but backstabbed you many times to its advantage. You look at the AI’s source code and find out that it has backstabbing as a major goal, but the part that talks to people isn’t aware of that so that it can lie better. Would you say that the AI is faulty? That it is wrong and should make the talking module aware of its goals, even though this causes it to make more mistakes and thus lose more? If not, why do you think humans are broken?
If I understand you correctly, you are proposing that people constantly and strongly optimize to obtain signalling advantages. They do so without becoming directly aware of it, which further increases their efficiency.
“Efficiency” at achieving something other than what you should work towards is harmful. If it’s reliableenough, let your conscious mind decide if signaling advantages or something else is what you should be optimizing. Otherwise, you let that Blind Idiot Azathothpick your purposes for you, trusting it more than you trust yourself.
“Efficiency” at achieving something other than what you should work towards is harmful. … Otherwise, you let that Blind Idiot Azathoth pick your purposes for you, trusting it more than you trust yourself.
The purpose of solving friendly AI is to protect the purposes picked for us by the blind idiot god.
Our psychological adaptations are not our purposes, we don’t want to protect them, even though they contribute to determining what it is we want to protect. See Evolutionary Psychology.
For one, status-seeking is a zero sum game and only indirectly causes overall gains. The world would be a much better place if people actually cared about things like saving the world or even helping others, and put a little thought to it.
Also, mismatches between our consciously-held goals and our behavior cause plenty of frustration and unhappiness, like in the case of the person who keeps stressing out because their studies don’t progress.
For one, status-seeking is a zero sum game and only indirectly causes overall gains. The world would be a much better place if people actually cared about things like saving the world or even helping others, and put a little thought to it.
If I actually cared about saving the world and about conserving my resources, it seems like I would choose some rate of world-saving A.
If I actually cared about saving the world, about conserving my resources, and the opinion of my peers, it seems like I would choose some rate of world-saving B. For reasonable scenarios, B would be greater than A because I can also get respect from my peers, and when you raise demand and keep supply constant quantity supplied increases.
That is, I understand that status causes faking behavior that’s a drain. (Status conflicts also lower supply, but it’s not clear how much.) I don’t think it’s clear that the mechanism of status-seeking conflicts with actually caring about other goals or detracts from them on net.
I’m sure you’ve considered that “X is a 0 sum game” doesn’t always mean that you should unilaterally avoid playing that game entirely. It does mean you’ll want to engineer environments where X taxes at a lower rate.
For one, the world would be a much better place if people actually cared about things like saving the world or even helping others, and put a little thought to it.
Why do you want to save the world? To allow people, humans, to do what they like to do for much longer than they would otherwise be able to. Status-seeking is one of those things that people are especially fond of.
Ask yourself, would you have written this post after a positive Singularity? Would it matter if some people were engaged in status games all day long?
What you are really trying to tell people is that they want to help solving friendly AI because it is universally instrumentally useful.
In case you want to argue that status-seeking is bad, no matter what, under any circumstances, you have to explain why that is so. And if you are unable to ground utility in something that is physically measurable, like the maximization of certain brain states, then I don’t think that you could convincingly demonstrate it to be a relatively undesirable human activity.
Umm. Sure, status-seeking may be fine once we have solved all possible problems anyway and we’re living in a perfect utopia. But that’s not very relevant if we want to discuss the world as it is today.
But that’s not very relevant if we want to discuss the world as it is today.
It is very relevant, because the reason why we want to solve friendly AI in the first place is to protect our complex values given to us by the Blind Idiot God.
For one, status-seeking is a zero sum game and only indirectly causes overall gains.
But if status-seeking is what you really want, as evidenced by your decisions, how can you say it’s bad that you do it? Can’t I just go and claim any goal you’re not optimizing for as your “real” goal you “should” have? Alternatively, can’t I claim that you only want us to drop status-seeking to get rid of the competition? Where’s your explanatory power?
I want people to work toward noble efforts like charity work, but don’t care much about whether they attian high status. So it’s useful to aid the bit of their brain that wants to do what I want it to do.
People who care about truth might spot that part of your AI’s brain wants to speak the truth, and so they will help it do this, even though this will cost it Diplomacy games. They do this because they care more about truth than Diplomacy.
By “caring about truth” here do you mean wanting systems to make explicit utterances that accurately reflect their actual motives? E.g., if X is a chess-playing AI that doesn’t talk about what it wants at all, just plays chess, would a person who “cares about truth” would also be motivated to give X the ability and inclination to talk about its goals (and do so accurately)?
Or wanting systems not to make explicit utterances that inaccurately reflect their actual motives? E.g., a person who “cares about truth” might also be motivated to remove muflax’s AI’s ability to report on its goals at all? (This would also prevent it from winning Diplomacy games, but we’ve already stipulated that isn’t a showstopper.)
I intended both (i.e. that they wanted accurate statements to be uttered and no inaccurate statements) but the distinction isn’t important to my argument, which was just that they want what they want.
I don’t see how this is admirable at all. This is coercion.
If I work for a charitable organization, and my primary goal is to gain status and present an image as a charitable person, then efforts by you to change my mind are adversarial. Human minds are notoriously malleable, so it’s likely that by insisting I do some status-less charity work you are likely to convince me on a surface level. And so I might go and do what you want, contrary to my actual goals. Thus, you have directly harmed me for the sake of your goals. In my opinion this is unacceptable.
Honest question: how do you know you have these goals? Presumably they don’t manifest in actual behavior, or you wouldn’t have a problem. If Kaj’s analysis is right, shouldn’t you assume that the belief of having these goals is part of your (working) strategy to gain certain status? Would you accept the same argument if Bruce made it?
Put it this way, if there was a pill that I believed would cause me to effectively have that goal, in a way that was compatible with a livable life, I would take it.
But don’t you already have this pill? You know, you can just do what you want. There is no akrasia fairy that forces you to procrastinate. Isn’t that basic reductionism? You are an algorithm, that algorithm optimizes for a certain state, we call this state its goal. An algorithm just is its code, so it can only optimize for this goal. It is incoherent to say that the algorithm does A, but wants B. The agent is its behavior.
So, how could you not do what you want? Your self-modelling can be deficient or biased, but part of the claim is that this bias actually helps you signal better, and is thus advantageous. Or you might not be very powerful and choose sub-optimal options, but that’s also not the claim. How, algorithmically, does your position work?
(The best I can do is to assume that there are two agents A and B, how want X and Y, respectively, and A is really good at getting X, but B unfortunately models itself as being A, but is also incompetent enough to think A wants Y, so that B still believes it wants Y. B has little power and is exploited by A, so B rarely makes progress towards Y and thus has a problem and complains. But that doesn’t sound too realistic.)
There are many modules, running different algorithms. I identify with my conscious modules, which quite often lose out to the non-conscious ones.
I find the claim “you are the sum of your conscious and non-conscious modules, so whatever they produce as their overall output is what you want” to be rather similar to the claim that “you are the sum of your brain and body, so whatever they produce as their overall output is what you want”. Both might be considered technically true, but it still seems odd to say that a paraplegic could walk if he wanted to, and him not walking just demonstrates that he doesn’t really want to.
While we’re at it, there’s also the claim that I am the sum of the conscious and unconscious modules of everyone living in Massachusetts. And an infinite number of other claims along those lines.
Many of these sorts of claims seem odd to me as well.
Both might be considered technically true, but it still seems odd to say that a paraplegic could walk if he wanted to, and him not walking just demonstrates that he doesn’t really want to.
There is a difference between preferences and constraints. See e.g. Caplan.
Hmm. It occurs to me that this disagreement might be because my original description of the issue mentioned several different scenarios, which I did not clearly differentiate between.
Scenario one: a person who wants to do prestigious jobs for a charity. Depending on the person, it could be that this is genuinely his preference, which won’t change even if consciously realizes that this is his preference. In that scenario, then yes, it’s just a preference and there’s no problem as such. (Heck, I know I could never get any major project done if there wasn’t some status pull involved in the project, somehow.) On the other hand, the person might want to change his behavior if he realized how and why he was acting.
Scenario two: a person wants to e.g. graduate from school, but he’s having a hard time studying effectively because he isn’t fully motivated on a subconscious level. This would correspond to what Caplan defines as a constraint:
If a person had 24 hours of time to divide between walking and resting, and a healthy person faced budget constraint A, then after contracting the flu or cancer, the same person would face a budget constraint such as B. A sufficiently sick person might collapse if he tried to walk for more than a few miles – suffering from reduced endurance as well as reduced speed. Then the budget constraint of the sick person would differ more starkly from the healthy person’s, as shown by the kinked constraint in Figure 2.
This person tries to get studying done, and devotes a lot of time and energy to it. But because his subconscious gaols aren’t fully aligned with his conscious goals, he needs to allocate far more time and energy to studying than the person whose subconscious goals are fully aligned with his conscious goals.
I don’t think the second kind is really a constraint. It’s more like the ADD child example Caplan uses:
A few of the symptoms of inattention [...] are worded to sound more like constraints. However, each of these is still probably best interpreted as descriptions of preferences. As the DSM uses the term, a person who “has difficulty” “sustaining attention in tasks or play activities” could just as easily be described as “disliking” sustaining attention. Similarly, while “is often forgetful in daily activities” could be interpreted literally as impaired memory, in context it refers primarily to conveniently forgetting to do things you would rather avoid. No one accuses a boy diagnosed with ADHD of forgetting to play videogames.
I can easily frame the student as disliking studying (for good reasons—it’s hard work and probably pretty useless for their goals) and thus playing up the pain. This episode of struggle and suffering itself is useful, so they keep it up. Why should I conclude that this is a problematic conflict and not a good compromise? And even if I accept the goal conflict, why side with the lamenting part? Why not with the part that is bored and hates the hard work, and certainly doesn’t want to get more effective and study even more?
(I think I have clarified my position enough and the attempts by others haven’t helped me to understand your claim. I don’t want to get into an “I value this part” debate. These have never been constructive before, so I’m going to drop it now.)
But don’t you already have this pill? You know, you can just do what you want. There is no akrasia fairy that forces you to procrastinate. Isn’t that basic reductionism?
The best I can do is to assume that there are two agents A and B, how want X and Y, respectively, and A is really good at getting X, but B unfortunately models itself as being A, but is also incompetent enough to think A wants Y, so that B still believes it wants Y. B has little power and is exploited by A, so B rarely makes progress towards Y and thus has a problem and complains. But that doesn’t sound too realistic
That actually sounds like a pretty good description of the problem, and of “normal” human behavior in situations where X and Y aren’t aligned. (Which, by the way, is not a human universal, and there are good reasons to assume that it’s not the only kind of situation for which evolution has prepared us).
The part that’s missing from your description is that part A, while very persistent, lacks any ability to really think things through in the way that B can, and makes its projections and choices based on a very “dumb” sort of database.… a database that B has read/write access to.
The premise of mindhacking, at least in the forms I teach, is that you can change A’s behavior and goals by tampering with its database, provided that you can find the relevant entries in that database. The actual tampering part is pretty ridiculously easy, as memories are notoriously malleable and distortable just by asking questions about them. Finding the right memories to mess with is the hard part, since A’s actual decision-making process is somewhat opaque to B, and most of A’s goal hierarchy is completely invisible to B, and must be inferred by probing the database with hypothetical-situation queries.
One of the ways that A exploits B is that B perceives itself as having various overt, concrete goals… that are actually comparatively low-level subgoals of A’s true goals. And as I said, those goals are not available to direct introspection; you have to use hypothetical-situation queries to smoke out what A’s true goals are.
Actually, it’s somewhat of a misnomer to say that A exploits B, or even to see A as an entity at all. To me, A is just machinery, automated equipment. While it has a certain amount of goal consistency protection (i.e., desire to maintain goals across self-modification), it is not very recursive and is easily defeated once you identify the Nth-order constraint on a particular goal, for what’s usually a very low value of N.
So, it’s more useful (I find) to think of A as a really powerful and convenient automaton that can learn and manage plenty of things on its own, but which sometimes gets things wrong and needs B’s help to troubleshoot the problems.
That’s because part A isn’t smart enough to resolve inter-temporal conflicts on its own; absent injunctive relief or other cached thoughts to overcome discounting, it’ll stay stuck in a loop of preference reversals pretty much forever.
Yes. As far as I can tell, you already have the option, but don’t use it. What makes you think you would do so in future cases? If akratics reliably would take such a pill, wouldn’t you expect self-help to work? The phenomenon of people getting results, but still not sticking with it shouldn’t exist then.
If akratics reliably would take such a pill, wouldn’t you expect self-help to work?
My own observation is that people generally stop using self-help techniques that actually work, and often report puzzlement as to why they stopped.
So I think akratics would take such a pill. The catch is that self-help is generally a pill that must be taken daily, and as soon as your brain catches up with the connection between taking the pill and making progress on a goal you don’t actually want to make progress on… you’ll start “mysteriously forgetting” to take the pill.
The only thing I know that works for this sort of situation is getting sufficiently clear on your covert goals to resolve the conflict(s) between them.
It’s excessive to claim that the hard work, introspection, and personal -change- (the hardest part) required to align your actions with a given goal are equivalent in difficulty or utility to just taking a pill.
Even if self-help techniques consistently worked, you’d still have to compare the opportunity cost of investing that effort with the apparent gains from reaching a goal. And estimating the utility of a goal is really difficult, especially when it’s a goal you’ve never experienced before.
Yes. It might be doing exactly what it was designed to do, but its designer was clearly stupid or cruel and had different goals than I’d prefer the AI to have.
Extrapolate this to humans. Humans wouldn’t care so much about status if it weren’t for flaws like scope insensitivity, self-serving bias, etc., as well as simply poor design “goals”.
Yes. It might be doing exactly what it was designed to do, but its designer was clearly stupid or cruel and had different goals than I’d prefer the AI to have.
Where are you getting your goals from? What are you, except your design? You are what Azathoth build. There is no ideal you that you should’ve become, but which Azathoth failed to make.
Azathoth designed me with conflicting goals. Subconsciously, I value status, but if I were to take a pill that made me care entirely about making the world better and nothing else, I would. Just because “evolution” built that into me doesn’t make it bad, but it definitely did not give me a coherent volition. I have determined for my self which parts of humanity’s design are counterproductive, based on the thousand shards of desire.
Does your expression of confusion here allow you to challenge the OP’s implicit premise that their failure to optimize for the goals they explicitly endorse rather than optimize for signalling is a problem, without overtly signalling such a challenge and thereby potentially subjecting yourself to reprisal?
If so, are you aware of the fact?
If you aren’t, is it real confusion or not?
I’m not sure that question means anything, any more than the question of whether the OP has a real problem does. If you are aware of it and similarly aware of your expression of confusion being disingenuous, then by convention we say you’re not really confused; if you aren’t, we say you are. We can make similar decisions about whether to say the OP has a real problem or not.
Not sure if I understand you correctly; let me try to rephrase it.
You are saying it is possible I claim confusion because I expect to gain status (contrarian status maybe?), as per Kaj’s post, instead of being actually confused? Sure. I considered it, but rejected it because that weakens the explanatory power of status signalling. (I’m not sure if I agree with the signalling assumption, but let’s for the sake of the argument.)
A real problem exists if an agent tries to optimize for a goal, but sucks at it. It’s own beliefs are not relevant (unless the goal is about its beliefs). If Kaj is correct, then humans are optimizing for status, but sacrifice some accuracy of their self-modelling power. It seems to work out, so how is this problematic?
In other words, an agent wants X. It models itself to get better at getting X. The self-model is, among other things, the basis for communication with other agents. The self-model is biased to model itself wrongly as wanting Y. It is advantageous for the agent to be seen as wanting Y, not X. The inaccurate self-model doesn’t cause substantial damage to its ability to pursue X, and it is much easier for the self-model to be biased than to lie. This setup sounds like a feature, not like a bug. If you observed it in an organism that wasn’t you, wasn’t even human, would you say the organism has a problem?
I’m saying it’s possible that what’s really going on is that you think Kaj is mistaken when he calls the situation a problem… that he has made an error. But rather than say “Kay, you are mistaken, you have made an error” you say “Kaj, I’m confused.” And that the reason you do this is because to say “Kay, you are mistaken, you have made an error” is to challenge Kaj’s status, which would potentially subject you to reprisals.
It’s possible that you’re doing this deliberately. In that case, by convention we would say you aren’t really confused. (We might also, by convention, say you’re being dishonest, or say that you’re being polite, or say various other things.)
It’s also possible that you are doing this unknowingly… that you are generating the experience of confusion so as to protect yourself from reprisal. In this case, it’s less clear whether convention dictates that we say you are “really confused” or “not really confused”. I would say it doesn’t at all matter; the best thing to do is not ask that question. (Or, if we must, to agree on a convention as to which one it is.)
In any case, I agree with your basic point about goal optimization, I just think talking about whether it constitutes a “real problem” or not contributes nothing to the discussion, much like I think talking about whether you’re experiencing “real confusion” in the latter case contributes nothing.
That said, you are completely ignoring the knock-on effects of lying (e.g., increasing the chances that I will be perceived as lying in a social context where being perceived in this way has costs).
Ah, then I misunderstood you. Yes, I believe Kaj is wrong, either in calling this a problem or in the assumption that status-seeking is a good explanation of it. However, based on past contributions, I think that Kaj has thought a lot about this and it is more likely that I’m misunderstanding him than that he is wrong. Thus my expressed confusion. If further discussion fails to clear this up, I will shift to assuming that he is simply wrong.
I don’t understand why you call this a problem. If I understand you correctly, you are proposing that people constantly and strongly optimize to obtain signalling advantages. They do so without becoming directly aware of it, which further increases their efficiency. So we have a situation where people want something and choose an efficient way to get it. Isn’t that good?
More directly, I’m confused how you can look at an organism, see that it uses its optimization power in a goal-oriented and efficient way (status gains in this case) and call that problematic, merely because some of these organisms disagree that this is their actual goal. What would you want them to do—be honest and thus handicap their status seeking?
Say you play many games of Diplomacy) against an AI, and the AI often promised you to be loyal, but backstabbed you many times to its advantage. You look at the AI’s source code and find out that it has backstabbing as a major goal, but the part that talks to people isn’t aware of that so that it can lie better. Would you say that the AI is faulty? That it is wrong and should make the talking module aware of its goals, even though this causes it to make more mistakes and thus lose more? If not, why do you think humans are broken?
“Efficiency” at achieving something other than what you should work towards is harmful. If it’s reliable enough, let your conscious mind decide if signaling advantages or something else is what you should be optimizing. Otherwise, you let that Blind Idiot Azathoth pick your purposes for you, trusting it more than you trust yourself.
The purpose of solving friendly AI is to protect the purposes picked for us by the blind idiot god.
Our psychological adaptations are not our purposes, we don’t want to protect them, even though they contribute to determining what it is we want to protect. See Evolutionary Psychology.
For one, status-seeking is a zero sum game and only indirectly causes overall gains. The world would be a much better place if people actually cared about things like saving the world or even helping others, and put a little thought to it.
Also, mismatches between our consciously-held goals and our behavior cause plenty of frustration and unhappiness, like in the case of the person who keeps stressing out because their studies don’t progress.
If I actually cared about saving the world and about conserving my resources, it seems like I would choose some rate of world-saving A.
If I actually cared about saving the world, about conserving my resources, and the opinion of my peers, it seems like I would choose some rate of world-saving B. For reasonable scenarios, B would be greater than A because I can also get respect from my peers, and when you raise demand and keep supply constant quantity supplied increases.
That is, I understand that status causes faking behavior that’s a drain. (Status conflicts also lower supply, but it’s not clear how much.) I don’t think it’s clear that the mechanism of status-seeking conflicts with actually caring about other goals or detracts from them on net.
I’m sure you’ve considered that “X is a 0 sum game” doesn’t always mean that you should unilaterally avoid playing that game entirely. It does mean you’ll want to engineer environments where X taxes at a lower rate.
Why do you want to save the world? To allow people, humans, to do what they like to do for much longer than they would otherwise be able to. Status-seeking is one of those things that people are especially fond of.
Ask yourself, would you have written this post after a positive Singularity? Would it matter if some people were engaged in status games all day long?
What you are really trying to tell people is that they want to help solving friendly AI because it is universally instrumentally useful.
In case you want to argue that status-seeking is bad, no matter what, under any circumstances, you have to explain why that is so. And if you are unable to ground utility in something that is physically measurable, like the maximization of certain brain states, then I don’t think that you could convincingly demonstrate it to be a relatively undesirable human activity.
Umm. Sure, status-seeking may be fine once we have solved all possible problems anyway and we’re living in a perfect utopia. But that’s not very relevant if we want to discuss the world as it is today.
It is very relevant, because the reason why we want to solve friendly AI in the first place is to protect our complex values given to us by the Blind Idiot God.
If we’re talking about Friendly AI design, sure. I wasn’t.
But if status-seeking is what you really want, as evidenced by your decisions, how can you say it’s bad that you do it? Can’t I just go and claim any goal you’re not optimizing for as your “real” goal you “should” have? Alternatively, can’t I claim that you only want us to drop status-seeking to get rid of the competition? Where’s your explanatory power?
By the suffering it causes, and also by the fact that when I have realized that I’m doing it, I’ve stopped doing (that particular form of) it.
I want people to work toward noble efforts like charity work, but don’t care much about whether they attian high status. So it’s useful to aid the bit of their brain that wants to do what I want it to do.
People who care about truth might spot that part of your AI’s brain wants to speak the truth, and so they will help it do this, even though this will cost it Diplomacy games. They do this because they care more about truth than Diplomacy.
By “caring about truth” here do you mean wanting systems to make explicit utterances that accurately reflect their actual motives? E.g., if X is a chess-playing AI that doesn’t talk about what it wants at all, just plays chess, would a person who “cares about truth” would also be motivated to give X the ability and inclination to talk about its goals (and do so accurately)?
Or wanting systems not to make explicit utterances that inaccurately reflect their actual motives? E.g., a person who “cares about truth” might also be motivated to remove muflax’s AI’s ability to report on its goals at all? (This would also prevent it from winning Diplomacy games, but we’ve already stipulated that isn’t a showstopper.)
I intended both (i.e. that they wanted accurate statements to be uttered and no inaccurate statements) but the distinction isn’t important to my argument, which was just that they want what they want.
I don’t see how this is admirable at all. This is coercion.
If I work for a charitable organization, and my primary goal is to gain status and present an image as a charitable person, then efforts by you to change my mind are adversarial. Human minds are notoriously malleable, so it’s likely that by insisting I do some status-less charity work you are likely to convince me on a surface level. And so I might go and do what you want, contrary to my actual goals. Thus, you have directly harmed me for the sake of your goals. In my opinion this is unacceptable.
It’s a problem from the point of view of that part of me that actually wants to achieve large scale strategic goals.
Honest question: how do you know you have these goals? Presumably they don’t manifest in actual behavior, or you wouldn’t have a problem. If Kaj’s analysis is right, shouldn’t you assume that the belief of having these goals is part of your (working) strategy to gain certain status? Would you accept the same argument if Bruce made it?
Put it this way, if there was a pill that I believed would cause me to effectively have that goal, in a way that was compatible with a livable life, I would take it.
But don’t you already have this pill? You know, you can just do what you want. There is no akrasia fairy that forces you to procrastinate. Isn’t that basic reductionism? You are an algorithm, that algorithm optimizes for a certain state, we call this state its goal. An algorithm just is its code, so it can only optimize for this goal. It is incoherent to say that the algorithm does A, but wants B. The agent is its behavior.
So, how could you not do what you want? Your self-modelling can be deficient or biased, but part of the claim is that this bias actually helps you signal better, and is thus advantageous. Or you might not be very powerful and choose sub-optimal options, but that’s also not the claim. How, algorithmically, does your position work?
(The best I can do is to assume that there are two agents A and B, how want X and Y, respectively, and A is really good at getting X, but B unfortunately models itself as being A, but is also incompetent enough to think A wants Y, so that B still believes it wants Y. B has little power and is exploited by A, so B rarely makes progress towards Y and thus has a problem and complains. But that doesn’t sound too realistic.)
There are many modules, running different algorithms. I identify with my conscious modules, which quite often lose out to the non-conscious ones.
I find the claim “you are the sum of your conscious and non-conscious modules, so whatever they produce as their overall output is what you want” to be rather similar to the claim that “you are the sum of your brain and body, so whatever they produce as their overall output is what you want”. Both might be considered technically true, but it still seems odd to say that a paraplegic could walk if he wanted to, and him not walking just demonstrates that he doesn’t really want to.
While we’re at it, there’s also the claim that I am the sum of the conscious and unconscious modules of everyone living in Massachusetts. And an infinite number of other claims along those lines.
Many of these sorts of claims seem odd to me as well.
There is a difference between preferences and constraints. See e.g. Caplan.
Hmm. It occurs to me that this disagreement might be because my original description of the issue mentioned several different scenarios, which I did not clearly differentiate between.
Scenario one: a person who wants to do prestigious jobs for a charity. Depending on the person, it could be that this is genuinely his preference, which won’t change even if consciously realizes that this is his preference. In that scenario, then yes, it’s just a preference and there’s no problem as such. (Heck, I know I could never get any major project done if there wasn’t some status pull involved in the project, somehow.) On the other hand, the person might want to change his behavior if he realized how and why he was acting.
Scenario two: a person wants to e.g. graduate from school, but he’s having a hard time studying effectively because he isn’t fully motivated on a subconscious level. This would correspond to what Caplan defines as a constraint:
This person tries to get studying done, and devotes a lot of time and energy to it. But because his subconscious gaols aren’t fully aligned with his conscious goals, he needs to allocate far more time and energy to studying than the person whose subconscious goals are fully aligned with his conscious goals.
I don’t think the second kind is really a constraint. It’s more like the ADD child example Caplan uses:
I can easily frame the student as disliking studying (for good reasons—it’s hard work and probably pretty useless for their goals) and thus playing up the pain. This episode of struggle and suffering itself is useful, so they keep it up. Why should I conclude that this is a problematic conflict and not a good compromise? And even if I accept the goal conflict, why side with the lamenting part? Why not with the part that is bored and hates the hard work, and certainly doesn’t want to get more effective and study even more?
(I think I have clarified my position enough and the attempts by others haven’t helped me to understand your claim. I don’t want to get into an “I value this part” debate. These have never been constructive before, so I’m going to drop it now.)
No.
That actually sounds like a pretty good description of the problem, and of “normal” human behavior in situations where X and Y aren’t aligned. (Which, by the way, is not a human universal, and there are good reasons to assume that it’s not the only kind of situation for which evolution has prepared us).
The part that’s missing from your description is that part A, while very persistent, lacks any ability to really think things through in the way that B can, and makes its projections and choices based on a very “dumb” sort of database.… a database that B has read/write access to.
The premise of mindhacking, at least in the forms I teach, is that you can change A’s behavior and goals by tampering with its database, provided that you can find the relevant entries in that database. The actual tampering part is pretty ridiculously easy, as memories are notoriously malleable and distortable just by asking questions about them. Finding the right memories to mess with is the hard part, since A’s actual decision-making process is somewhat opaque to B, and most of A’s goal hierarchy is completely invisible to B, and must be inferred by probing the database with hypothetical-situation queries.
One of the ways that A exploits B is that B perceives itself as having various overt, concrete goals… that are actually comparatively low-level subgoals of A’s true goals. And as I said, those goals are not available to direct introspection; you have to use hypothetical-situation queries to smoke out what A’s true goals are.
Actually, it’s somewhat of a misnomer to say that A exploits B, or even to see A as an entity at all. To me, A is just machinery, automated equipment. While it has a certain amount of goal consistency protection (i.e., desire to maintain goals across self-modification), it is not very recursive and is easily defeated once you identify the Nth-order constraint on a particular goal, for what’s usually a very low value of N.
So, it’s more useful (I find) to think of A as a really powerful and convenient automaton that can learn and manage plenty of things on its own, but which sometimes gets things wrong and needs B’s help to troubleshoot the problems.
That’s because part A isn’t smart enough to resolve inter-temporal conflicts on its own; absent injunctive relief or other cached thoughts to overcome discounting, it’ll stay stuck in a loop of preference reversals pretty much forever.
Are you saying that I would not take such a pill if it were offered to me in pill form, and my prediction that I would is wrong, or something else?
Yes. As far as I can tell, you already have the option, but don’t use it. What makes you think you would do so in future cases? If akratics reliably would take such a pill, wouldn’t you expect self-help to work? The phenomenon of people getting results, but still not sticking with it shouldn’t exist then.
My own observation is that people generally stop using self-help techniques that actually work, and often report puzzlement as to why they stopped.
So I think akratics would take such a pill. The catch is that self-help is generally a pill that must be taken daily, and as soon as your brain catches up with the connection between taking the pill and making progress on a goal you don’t actually want to make progress on… you’ll start “mysteriously forgetting” to take the pill.
The only thing I know that works for this sort of situation is getting sufficiently clear on your covert goals to resolve the conflict(s) between them.
I was definitely envisaging a pill that only needs to be taken once, not one that needs to be taken daily.
It’s excessive to claim that the hard work, introspection, and personal -change- (the hardest part) required to align your actions with a given goal are equivalent in difficulty or utility to just taking a pill.
Even if self-help techniques consistently worked, you’d still have to compare the opportunity cost of investing that effort with the apparent gains from reaching a goal. And estimating the utility of a goal is really difficult, especially when it’s a goal you’ve never experienced before.
The backstabbing AI would take the non-backstabbing pill.
Yes. It might be doing exactly what it was designed to do, but its designer was clearly stupid or cruel and had different goals than I’d prefer the AI to have.
Extrapolate this to humans. Humans wouldn’t care so much about status if it weren’t for flaws like scope insensitivity, self-serving bias, etc., as well as simply poor design “goals”.
Where are you getting your goals from? What are you, except your design? You are what Azathoth build. There is no ideal you that you should’ve become, but which Azathoth failed to make.
Azathoth designed me with conflicting goals. Subconsciously, I value status, but if I were to take a pill that made me care entirely about making the world better and nothing else, I would. Just because “evolution” built that into me doesn’t make it bad, but it definitely did not give me a coherent volition. I have determined for my self which parts of humanity’s design are counterproductive, based on the thousand shards of desire.
Would you sign up to be tortured so that others don’t suffer dust specks?
(“If we are here to make others happy, what are the others here for?”)
Yes.
A better analogy would be asking about a pill that caused pain asymbolia.
Does your expression of confusion here allow you to challenge the OP’s implicit premise that their failure to optimize for the goals they explicitly endorse rather than optimize for signalling is a problem, without overtly signalling such a challenge and thereby potentially subjecting yourself to reprisal?
If so, are you aware of the fact?
If you aren’t, is it real confusion or not?
I’m not sure that question means anything, any more than the question of whether the OP has a real problem does. If you are aware of it and similarly aware of your expression of confusion being disingenuous, then by convention we say you’re not really confused; if you aren’t, we say you are. We can make similar decisions about whether to say the OP has a real problem or not.
Not sure if I understand you correctly; let me try to rephrase it.
You are saying it is possible I claim confusion because I expect to gain status (contrarian status maybe?), as per Kaj’s post, instead of being actually confused? Sure. I considered it, but rejected it because that weakens the explanatory power of status signalling. (I’m not sure if I agree with the signalling assumption, but let’s for the sake of the argument.)
A real problem exists if an agent tries to optimize for a goal, but sucks at it. It’s own beliefs are not relevant (unless the goal is about its beliefs). If Kaj is correct, then humans are optimizing for status, but sacrifice some accuracy of their self-modelling power. It seems to work out, so how is this problematic?
In other words, an agent wants X. It models itself to get better at getting X. The self-model is, among other things, the basis for communication with other agents. The self-model is biased to model itself wrongly as wanting Y. It is advantageous for the agent to be seen as wanting Y, not X. The inaccurate self-model doesn’t cause substantial damage to its ability to pursue X, and it is much easier for the self-model to be biased than to lie. This setup sounds like a feature, not like a bug. If you observed it in an organism that wasn’t you, wasn’t even human, would you say the organism has a problem?
I’m saying it’s possible that what’s really going on is that you think Kaj is mistaken when he calls the situation a problem… that he has made an error. But rather than say “Kay, you are mistaken, you have made an error” you say “Kaj, I’m confused.” And that the reason you do this is because to say “Kay, you are mistaken, you have made an error” is to challenge Kaj’s status, which would potentially subject you to reprisals.
It’s possible that you’re doing this deliberately. In that case, by convention we would say you aren’t really confused. (We might also, by convention, say you’re being dishonest, or say that you’re being polite, or say various other things.)
It’s also possible that you are doing this unknowingly… that you are generating the experience of confusion so as to protect yourself from reprisal. In this case, it’s less clear whether convention dictates that we say you are “really confused” or “not really confused”. I would say it doesn’t at all matter; the best thing to do is not ask that question. (Or, if we must, to agree on a convention as to which one it is.)
In any case, I agree with your basic point about goal optimization, I just think talking about whether it constitutes a “real problem” or not contributes nothing to the discussion, much like I think talking about whether you’re experiencing “real confusion” in the latter case contributes nothing.
That said, you are completely ignoring the knock-on effects of lying (e.g., increasing the chances that I will be perceived as lying in a social context where being perceived in this way has costs).
Ah, then I misunderstood you. Yes, I believe Kaj is wrong, either in calling this a problem or in the assumption that status-seeking is a good explanation of it. However, based on past contributions, I think that Kaj has thought a lot about this and it is more likely that I’m misunderstanding him than that he is wrong. Thus my expressed confusion. If further discussion fails to clear this up, I will shift to assuming that he is simply wrong.