The point is, we don’t just want to turn humans into coherent agents, we want to turn them into coherent agents who can be said to have the same preferences as the original humans. But given that we don’t have a theory of preferences for incoherent agents, how do we know that any given trick intended to improve coherence is preference-preserving? Right now we have little to guide us except intuition.
I absolutely agree. The actual question I had written on my sheet, as I tried to figure out what a more powerful “rationality” might include, was “… into coherent agents, with something like the goals ‘we’ wish to have?” Branch #8 above is exactly the art of not having the goals-one-acts-on be at odds with the goals-one-actually-cares-about (and includes much mention of the usefulness of theory).
My impression, though, is that some of the other branches of rationality in the post are very helpful for self-modifying in a manner you’re less likely to regret. Philosophy always holds dangers, but a person approaching the question of “What goals shall I choose?”, and encountering confusing information that may affect what he wants (e.g., encountering arguments in meta-ethics, or realizing his religion is false, or realizing he might be able to positively or negatively affect a disorienting number of lives) will be much better off if he already has good self-knowledge and has accepted that his current state is his current state (vs. if he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior—a surprisingly common nerd failure mode).
I don’t know how to extrapolate the preferences of myself or other people either, but my guess is, while further theoretical work is critical, it’ll be easier to do this work in a non-insane fashion in the context of a larger, or more whole-personed, rationality. What are your thoughts here?
very helpful for self-modifying in a manner you’re less likely to regret.
I don’t think regret is the concern here… Your future self might be perfectly happy making paperclips. I almost think “not wanting your preferences changed” deserves a new term...Hmm, “pre-gret”?
I like to imagine another copy of my mind watching what I’m becoming, and being pleased. If I can do that, then I feel good about my direction.
You will find people who are willing to bite the “I won’t care when I’m dead” bullet, or at least claim to—it’s probably just the abstract rule-based part of them talking.
will be much better off if he already has good self-knowledge and has accepted that his current state is his current state
Everything here turns on the meaning of “accept”. Does it mean “acknowledge as a possibly fixable truth” or does it mean “consciously endorse”? I think you’re suggesting the latter but only defending the former, which is much more obviously true.
he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior—a surprisingly common nerd failure mode
Is the disagreement here about what his brain does, or about what parts of his brain to label as himself? If the former, it’s not obviously common, if the latter, it’s not obviously a failure mode.
I don’t have much data here, but I guess none of us do. Personally, I haven’t found it terribly helpful to learn that I’m probably driven in large part by status seeking, and not just pure intellectual curiosity. I’m curious what data points you have.
That is interesting to me because finding out I am largely a status maximizer (and that others are as well) has been one of the most valuable bits of information I’ve learned from OB/LW. This was especially true at work, where I realized I needed to be maximizing my status explicitly as a goal and not feel bad about it, which allowed me to do so far more efficiently.
You, upon learning that you’re largely a status maximizer, decided to emphasize status seeking even more, by doing it on a conscious level. But that means other competing goals (I assume you must have some) have been de-emphasized, since the cognitive resources of your conscious mind are limited.
I, on the other hand, do not want to want to seek status. Knowing that I’m driven largely by status seeking makes me want to self-modify in a way that de-emphasizes status seeking as a goal (*). But I’m not really sure either of these responses are rational.
(*) Unfortunately I don’t know how to do so effectively. Before, I’d just spend all of my time thinking about a problem on the object level. Now I can’t help but periodically wonder if I believe or argue for some position because it’s epistemically justified, or because it helps to maximize status. For me, this self doubt seems to sap energy and motivation without reducing bias enough to be worth the cost.
This is the simple version of the explicit model I have in my head at work now: I have two currencies, Dollars and Status. Every decision I make likely has some impact both in terms of our company’s results (Dollars) and also in terms of how I and others will be perceived (Status). The cost in Status to make any given decision is a reducing function of current Status. My long term goal is to maximize Dollars. However, often the correct way to maximize Dollars in the long term is to sacrifice Dollars for Status, bank the Status and use it to make better decisions later.
I think this type of thing should be common. Status is a resource that is used to acquire what you want, so in my mind there’s no shame in going after it.
Do you ever find yourself in situations where you would predict different things if you thought you were a pure-intellectual-curiosity-satisfier than if you think you’re in part a status-maximizer?
If so, is making more accurate predictions in such situations useful, or do accurate predictions not matter much?
I suspect that if I thought of myself as a pure-intellectual-curiosity-satisfier, I would be a lot more bewildered by my behavior and my choices than I am, and struggle with them a lot more than I do, and both of those would make me less happy.
If the way you seek status is ethical (“do good work” more than “market yourself as doing good work”) then you may not want to change anything once you discover your “true motivation”. And the alternative “don’t care about anything” hardly entices.
I absolutely agree. The actual question I had written on my sheet, as I tried to figure out what a more powerful “rationality” might include, was “… into coherent agents, with something like the goals ‘we’ wish to have?” Branch #8 above is exactly the art of not having the goals-one-acts-on be at odds with the goals-one-actually-cares-about (and includes much mention of the usefulness of theory).
My impression, though, is that some of the other branches of rationality in the post are very helpful for self-modifying in a manner you’re less likely to regret. Philosophy always holds dangers, but a person approaching the question of “What goals shall I choose?”, and encountering confusing information that may affect what he wants (e.g., encountering arguments in meta-ethics, or realizing his religion is false, or realizing he might be able to positively or negatively affect a disorienting number of lives) will be much better off if he already has good self-knowledge and has accepted that his current state is his current state (vs. if he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior—a surprisingly common nerd failure mode).
I don’t know how to extrapolate the preferences of myself or other people either, but my guess is, while further theoretical work is critical, it’ll be easier to do this work in a non-insane fashion in the context of a larger, or more whole-personed, rationality. What are your thoughts here?
I don’t think regret is the concern here… Your future self might be perfectly happy making paperclips. I almost think “not wanting your preferences changed” deserves a new term...Hmm, “pre-gret”?
Useful concept, bad example.
Upvoted for ‘pregret.’
I like to imagine another copy of my mind watching what I’m becoming, and being pleased. If I can do that, then I feel good about my direction.
You will find people who are willing to bite the “I won’t care when I’m dead” bullet, or at least claim to—it’s probably just the abstract rule-based part of them talking.
Everything here turns on the meaning of “accept”. Does it mean “acknowledge as a possibly fixable truth” or does it mean “consciously endorse”? I think you’re suggesting the latter but only defending the former, which is much more obviously true.
Is the disagreement here about what his brain does, or about what parts of his brain to label as himself? If the former, it’s not obviously common, if the latter, it’s not obviously a failure mode.
Those both sound like basically verbal/deliberate activities, which is probably not what Anna meant. I would say “not be averse to the thought of”.
I don’t have much data here, but I guess none of us do. Personally, I haven’t found it terribly helpful to learn that I’m probably driven in large part by status seeking, and not just pure intellectual curiosity. I’m curious what data points you have.
That is interesting to me because finding out I am largely a status maximizer (and that others are as well) has been one of the most valuable bits of information I’ve learned from OB/LW. This was especially true at work, where I realized I needed to be maximizing my status explicitly as a goal and not feel bad about it, which allowed me to do so far more efficiently.
You, upon learning that you’re largely a status maximizer, decided to emphasize status seeking even more, by doing it on a conscious level. But that means other competing goals (I assume you must have some) have been de-emphasized, since the cognitive resources of your conscious mind are limited.
I, on the other hand, do not want to want to seek status. Knowing that I’m driven largely by status seeking makes me want to self-modify in a way that de-emphasizes status seeking as a goal (*). But I’m not really sure either of these responses are rational.
(*) Unfortunately I don’t know how to do so effectively. Before, I’d just spend all of my time thinking about a problem on the object level. Now I can’t help but periodically wonder if I believe or argue for some position because it’s epistemically justified, or because it helps to maximize status. For me, this self doubt seems to sap energy and motivation without reducing bias enough to be worth the cost.
This is the simple version of the explicit model I have in my head at work now: I have two currencies, Dollars and Status. Every decision I make likely has some impact both in terms of our company’s results (Dollars) and also in terms of how I and others will be perceived (Status). The cost in Status to make any given decision is a reducing function of current Status. My long term goal is to maximize Dollars. However, often the correct way to maximize Dollars in the long term is to sacrifice Dollars for Status, bank the Status and use it to make better decisions later.
I think this type of thing should be common. Status is a resource that is used to acquire what you want, so in my mind there’s no shame in going after it.
How do time constraints play into this model?
Do you ever find yourself in situations where you would predict different things if you thought you were a pure-intellectual-curiosity-satisfier than if you think you’re in part a status-maximizer?
If so, is making more accurate predictions in such situations useful, or do accurate predictions not matter much?
I suspect that if I thought of myself as a pure-intellectual-curiosity-satisfier, I would be a lot more bewildered by my behavior and my choices than I am, and struggle with them a lot more than I do, and both of those would make me less happy.
If the way you seek status is ethical (“do good work” more than “market yourself as doing good work”) then you may not want to change anything once you discover your “true motivation”. And the alternative “don’t care about anything” hardly entices.