“If You’re Not a Holy Madman, You’re Not Trying”
I’ve been reading the hardcover SSC collection in the mornings, as a way of avoiding getting caught up in internet distractions first thing when I get up. I’d read many of Scott Alexander’s posts before, but nowhere near everything posted; and I hadn’t before made any attempt to dive the archives to “catch up” to the seeming majority of rationalists who have read everything Scott Alexander has ever written.
(The hardcover SSC collection is nowhere near everything on SSC, not to mention Scott’s earlier squid314 blog on livejournal. I’m curious how much shelf space a more complete anthology would occupy.)
Anyway, this has gotten me thinking about the character of Scott Alexander’s writing. I once remarked (at a LessWrong meetup) that Scott Alexander “could never be a cult leader”. I intended this as a sort of criticism. Scott Alexander doesn’t write with conviction in the same way some other prominent rationalist authors do. He usually has the attitude of a bemused bystander who is merely curious about a bunch of things. Some others in the group agreed with me, but took it as praise: compared to some other rationalist authors, Scott Alexander isn’t an ideologue.
(now I fear 90% of the comments are going to be some variation of “cults are bad”)
What I didn’t realize (at the time) was how obsessed Scott Alexander himself is with this distinction. Many of his posts grapple with variations on question of just how seriously we can take our ideas without going insane, contrasting the holy madman in the desert (who takes ideas 100% seriously) with the detached academic (who takes an intellectual interest in philosophy without applying it to life).
Beware Isolated Demands for Rigor is the post which introduces and seriously fleshes out this distinction. Scott says the holy madman and the detached academic are two valid extremes, because both of them are consistent in how they call for principles to be applied (the first always applies their intellectual standards to everything; the second never does). What’s invalid is when you use intellectual standards as a tool to get whatever you want, by applying the standards selectively.
Infinite Debt forges a middle path, praising Giving What We Can for telling people that you can just give 10% to charity and be an “Officially Recognized Good Person”—you don’t need to follow your principles all the way to giving away everything, or alternately, ignore your principles entirely. By following a simple collectively-chosen rule, you can avoid applying principles selectively in a self-serving (or overly not-self-serving) way.
Bottomless Pits Of Suffering talks about the cases where utilitarianism becomes uncomfortable and it’s tempting to ignore it.
But related ideas are in many other posts. It’s a thread which runs throughout Scott’s writing. (IMHO.)
This conflict is central to the human condition, or at least the WASP/WEIRD condition. I imagine most of Scott’s readers felt similar conflicts around applying their philosophies in practice.
But this is really weird from a decision-theoretic perspective. An agent should be unsure of principles, not sure of principles but unsure about applying them. (Related.)
It’s almost like Scott implicitly believes maximizing his own values would be bad somehow.
Some of this makes sense from a Goodhart perspective. Any values you explicitly articulate are probably not your values. But I don’t get the sense that this is what’s going on in Scott’s writing. For example, when he describes altruists selling all their worldly possessions, it doesn’t sound like he intends it as an example of Goodhart; it sounds like he intends it as a legit example of altruists maximizing altruist values.
In contrast, blogs like Minding our way to the heavens give me more of a sense of pushing the envelope on everything; I associate it with ideas like:
If you aren’t putting forth your full effort, it probably means this isn’t your priority. Figure out whether it’s worth doing at all, and if so, what the minimal level of effort to get what you want is. (Or, if it really is important, figure out what’s stopping you from giving it your full effort.) You can always put forth your full effort at the meta-level of figuring out how much effort to put into which things.
If you repeatedly don’t do things in line with your “values”, you’re probably wrong about what your values are; figure out what values you really care about, so that you can figure out how best to optimize those.
If you find that you’re fighting yourself, figure out what the fight is about, and find a way to best satisfy the values that are in conflict.
In more SSC-like terms, it’s like, if you’re not a holy madman, you’re not trying.
I’m not really pushing a particular side, here, I just think the dichotomy is interesting.
- 28 Feb 2024 5:53 UTC; 12 points) 's comment on Acting Wholesomely by (
- 20 May 2021 4:59 UTC; 9 points) 's comment on Re: Fierce Nerds by (
- 7 Aug 2022 19:07 UTC; 2 points) 's comment on Qazzquimby Shortform by (
It’s not clear that people should be agents. Agents are means of setting up content of the world to accord with values, they are not optimized for being the valuable content of the world. So a holy madman has a work-life balance problem, they are an instrument of their values rather than an incarnation of them.
This is a very striking statement, and I want to flag it as excellent.
It seems to me that the agents you are considering don’t have as complex a utility function as people, who seem to at least in part consider their own well being as part of their utility funciton. Additionally, people usually don’t have a clear idea of what their actual utility function is, so if they want to go all-in on it, they let some values fall by the wayside. AFAIK this limitation not a requirement for an agent.
If you had your utility function fully specified, I don’t think you could be considered both rational and also not a “holy madman”. (This borders on my answer to the question of free will, which so far as I can tell, is a question that should not explicitly be answered, so as to not spoil it for anyone who wants to figure it out for themselves.)
Suffice it to say that optimized/optimal function should be a convergent instrumental goal, similar to self-preservation, and a rational agent should thereby have it as a goal. If I am not mistaken, this means that a problem in work-life balance, as you put it, is not something that an actual rational agent would tolerate, provided there are options to choose from that don’t include this problem and have a similar return otherwise.
Or did I misinterpret what you wrote? I can be dense sometimes...^^
No, sounds right to me, at least approximately. It would be interesting to have theorems.
My position on free will is pretty developed, so I don’t think you’d be spoiling anything if you DMed me with that part of the thought.
I think there are a couple of responses the holy-madman type can give:
The holy-madman aesthetic is actually pretty nice. Human values include truth, which requires coherent thought. And in fiction, we especially enjoy heroes who go after coherent goals. So in practice in our current world, the tails don’t come apart much. At worst, people who manage to be more agentic aren’t making too big of a sacrifice in the incarnation department. And perhaps they’re actually better-off in that respect.
A coherent agent is basically what happens when you can split up the problem of deciding what to do and doing it, because most of the expected utility is in the rest of the world. An effective altruist who cares about cosmic waste probably thinks your argument is referring to something pretty negligible in comparison. Even if you argue functional decision theory means you’re controlling all similar agents, not just yourself, that could still be pretty negligible.
The nice things are skills and virtues, parts of designs that might get washed away by stronger optimization. If people or truths or playing chess are not useful/valuable, agents get rid of them, while people might have a different attitude.
(Part of the motivation here is in making sense of corrigibility. Also, I guess simulacrum level 4 is agency, but humans can’t function without a design, so attempts to take advantage of the absence of a design devolve into incoherence.)
Here’s a model for you:
Assume that Value Alignment is a single variable. We want to maximize it by optimizing our behaviors. But we have a limited budget for object-level action and any given meta-level of strategizing. For that reason, we iteratively strategize, act, and evaluate in sprints. During the sprint, we fully commit to the strategy. After the spring, we demonstrate and evaluate the results, and plan the next sprint.
We assume that conditions and needs will be constantly changing in unpredictable ways, which we can only discover through this sort of iterated effort. A plan/sprint/review-like approach allows us to balance the need for adaptability with the need for forward motion.
From this point of view, the Holy Madman and the Detached Academic are both failing to implement an effective strategy. The Holy Madman has left out the part where you review and adapt; the Detached Academic has left out the part where you sprint. A third archetype, perhaps the Effective Altruist (?), brings the whole strategy together.
Some blogs seem to be for directing the short-term sprints. Others seem to be about long-term, tentative strategizing and noticing confusion. When you read Eliezer Yudkowsky’s writing, it makes you feel like doing something right now, but you’re also going to need to revisit it in a couple weeks to see if it’s holding up. Eliezer is not your pal—he’s your boss.
Reading Scott, there’s no sense of urgency. You don’t need to do anything. But his best writing sticks with you in the way that an easy friendship does.
I think that the approaches based on being a holy madman greatly underestimates the difficulty on being a value maximiser running on corrupted, basic human hardware.
I’d be extremely skeptical on anyone who claims to have found a way to truly maximise it’s utility function, even if they claim to have avoided all the obvious pitfalls of burning out and so-so.
It would be extremely hard to conciliate “put forth your full effort” and staying rational enough to notice you’re burning yourself out or noticing that you’re getting stuck in some suboptimal route because you’re not leaving yourself enough slack to notice better opportunities.
The detached academic seems to me an odd way to describe Scott Alexander, who seems to make a really effective effort to spread his values and live his life rationally, for him most of the issues he talks about seem to be pretty practical and relevant, even if he often takes interest on what makes him curious and isn’t dropping everything to work on AI—maximise the number of competent people who would work on AI.
I’m currently in a now-nine-months-long attempt to move from detached-lazy-academic to make an extraordinary effort.
So far every attempt to accurately predict how much of a full effort I can make without getting backlash that makes me worse at it in the next period has failed.
Lots of my plans have failed, so if I had went along with plans that required me to make sacrifices, as taking an idea Seriously would require you to do, would have left me at a serious loss.
What worked most and obtained the most result was keeping a curious attitude toward plans and subjects that are related to my goal, studying to increase my competence in related areas even if I don’t see any immediate way it could be of help, and monitoring on how much “weight” I’m putting on the activities that produce the results I need.
I feel I started out being unbelievably bad at working seriously at something, but in nine months I got more results than in a lifetime (in a broad sense, not just related to my goal) and I feel like I went up a couple levels.
I try to avoid going toward any state that resembles a “holy madman” for fear of crashing hard, and I notice that what I’m doing already has me pass as one to even my most informed friends on related subjects, when I don’t censor to look normally modest and uninterested.
I might be just at such a low level in the skill of “actually working” that anything that would work great for a functional adult with a good work ethic is deadly to me.
But I’d strongly advise anyone trying the holy madman path to actively pump for as much “anti-holy-madmannes” as they can, since making the full effort to maximise for something seems to me the best way to make sure your ambition burns through any defence your naive, optimistic plans think you have put in place to protect your rationality and your mental health.
Cults are bad, becoming a one-man-cult is entirely possible and slightly worse.
I bet if you said this to Nate he’d have a pretty convincing counter. Even though Nate works some ridiculous number of hours a week (in contrast to me; I’m closer to the standard 40 hours), I suspect he has enough slack, and thinks of this as part of the optimization problem.
Part of the skill of optimizing without shooting yourself in the foot is explicitly counting slack as part of the optimization problem.
Part of the meta-skill of learning to do this is always asking yourself whether you’re falling into some kind of trap (mostly, forms of Goodhart), and prioritizing steps which avoid traps. EG if you were a self-modifying AGI, you would do well to self-modify in a cautious way, rather than as soon as something looks positive-EV.
However, I’m not sure whether this caution eventually cashes out to “don’t be a holy madman” vs “here’s how to be the right kind of holy madman”, in the terms of the post.
Yeah, I feel that I can similarly look back at my history and say that in several cases, it either has been better or would have been much better to be more the detached academic.
Mh… I guess “holy madman” is a definition too vague to make a rational debate on it? I had interpreted it as “sacrifice everything that won’t negatively affect your utility function later on”. So the interpretation I imagined would be someone that won’t leave himself an inch of comfort more than what’s needed to keep the quality of his work constant.
I see slack as leaving yourself enough comfort that you’d be ready to use your free energy in ways you can’t see at the moment, so I guess I was automatically assuming an “holy madman” would optimise for outputting the current best effort he can in the long term, rather than sacrificing some current effort to bet on future chances to improve the future output.
I’d define someone who’s leaving this level of slack as someone who’s making a serious or full effort, but not an holy madman, but I guess this doesn’t means much.
If I were to try to summarise my thoughts on what would happen in reality if someone were to try these options… I think the slack one would work better in general, both by managing to avoid pitfalls and to better exploit your potential for growth.
I still feel there’s a lot of danger to oneself in trying to take ideas seriously though. If you start trying to act like it’s your responsibility to solve a problem that’s killing people, the moment you lose your grip on your thoughts it’s the moment you cut yourself badly, at least in my experience.
In these days I’ve managed to reduce the harm that some recurrent thoughts were doing by focusing on distinguish between 1) me legitimately wanting A and planning/acting to achieve A and 2) my worries related to not being able to get A or distress for things currently being not A, telling myself that 2) doesn’t helps me get what I want in the least, and that I can still make a full effort for 1), likely a better one, without paying to 2) much attention.
(I’m afraid I’ve started to slightly rant from this point. I’m leaving it because I still feel it might be useful)
This strategy worked for my gender transition.
I’m not sure how I’d react if I were to try telling myself I shouldn’t care/feel bad/worry if people die because I’m not managing to fix the problem, even if I KNOW that worrying myself about people dying hinders my effort to fix the problem because feeling sick and worried and tired wouldn’t in any way help toward actually working on the problem, I still don’t trust my corrupted hardware to not start running some guilt trip against me because I’m trying to be, in a sense that’s not utilitarian at all, callous, because I’m trying to not care/feel bad/worry about something like that.
Also, as a personal anecdote of possible pitfalls, trying to take personal responsibility for a global problem had drained my resources in ways I could’t foreseen easily. When I got jumped by an unrelated problem about my gender, I found myself without the emotional resources to deal with both stresses at once, so some recurrent thoughts started blaming me because I was letting a personal problem that was in no way as bad as being dead, and didn’t blipped at all on any screen in confront to a large number of deaths, screw up with my attempt of working on something that was actually relevant. I realised immediately that this was a stupid thing to think and in no way healthy, but that didn’t do much to stop it, and climbing out of that pit of stress and guilt took a while.
In short, my emotional hardware is stupid and bugged and it irritates me to no end how it can just go ahead and ignore my attempts of thinking sanely about stuff.
I’m not sure if I’m just particularly bad at this, or if I just have expectations that are too high. An external view would likely tell me that it’s ridiculous for me to expect to be able to go from “lazy and detached” to “saving the world (read reducing X risk), while effortlessly holding at bay emotional problems that would trip most people”. I’d surely tell anyone that. On the other hand, it just feels like a stupid thing to not manage doing.
(end of the rant)
Can I ask if you have some sort of external force that makes you do these hours? If not, any advice on how to do that?
I’m coming from a really long tradition of not doing any work whatsoever, and so far I’m struggling to meet my current goal of 24 hours (also because the only deadlines are the ones I manage to give myself… and for reasons I guess I have explained above).
Getting to this was a massive improvement, but again, I feel like I’m exceptionally bad at working hard.
Goodharting is one thing, another thing is short-term (first-order) consequences vs long-term (second-order) consequences.
Imagine that you are the only altruist ever existing in the universe. You cannot reproduce or make your copy or spread your values. Furthermore, you are terminally ill and you know for sure that you will die in a week.
From that perspective, it would make sense to sell all your worldly possessions, spend the money to create as much good as you can, and die knowing you created the most good possible, and while it is sad that you couldn’t do more, it cannot be helped.
(Note that this thought experiment does not require you to be perfectly altruistic. Not only are you allowed to care about yourself, you are even allowed to care about yourself more than about the others. Suppose you value yourself as much as the rest of universe together. That still makes it simple: spend 50% of your money to make the remaining week as pleasurable for yourself as possible, and the remaining 50% to improve the world as much as possible.)
We do not live in such situation though. There are many people who feel altruistic to smaller or greater degree, and what any specific one of them does is most likely just a drop in the ocean. The drop may be even smaller than the waves it creates. Maybe instead of becoming e.g. a lawyer and donating your entire salary to charity, you could become e.g. a teacher or a writer, and influence many other people, so that they become lawyers and donate their salaries to charity… thus indirectly contributing to charities much more than you could do alone.
Of course this approach contains its own risk of going too meta—if literally everyone who ever feels altruistic becomes a teacher or a writer, and spends their whole salary on flyers promoting effective altruism, that would mean that the charity actually gets nothing at all. (Especially if it becomes common belief that being a meta-altruist is much better—i.e. higher status—than being a mere object-level altruist.)
The effect Scott probably worries about is the following: Should it become known that altruists generally live happy lives, or should it become known that altruists generally suffer a lot in order to maximize the global good? In short term, the latter creates more good—optimizing for charity gives more to charity than optimizing for a combination of charity and self-preservation. But in long term, don’t be surprised if people who are generally willing to help others, but have a strong self-preservation instict, decide that this altruism thing is not for them. A suffering altruist is an anti-advertisement for altruism. Therefore, in the name of maximizing the global good (as opposed to maximizing the good created personally by themselves) an effective altruist should strive to live a happy life! Because that attracts more people to become affective altruists, and more altruists can together create more good. But you should still donate some money, otherwise you are not an altruist.
So we have a collective problem of finding a function f such that if we make it a social norm that each altruist x should donate f(x), the total number donated to charities is maximized. It should be sufficiently high so that money actually is donated, and sufficiently low so that people are not discouraged to become altruists. And it seems like “donate 10% of your income” is a very good rule from this perspective.
Right, I agree with your distinction. I was thinking of this as something Scott was ignoring, when he wrote about selling all your possessions. I don’t want to read into it too much, since it was an offhand example of what it would look like to go all the way in the taking-altruism-seriously direction. But it does seem like Scott (at the time) implicitly believed that going too far would include things of this sort. (That’s the point of his example!) So when you say:
I’m like, no, I don’t think Scott was explicitly reasoning this way. Infinite Debt was not about how altruists need to think long-term about what does the most good. It was a post about how it’s OK not to do that all the time, and principles like altruism should be allowed to ask arbitrarily much from us. Yes, you can make an argument “thinking about the long-term good all the time isn’t the best way to produce the most long-term good” and “asking people to be as good as possible isn’t the best way to get them to be as good as possible” and things along those lines. But for better or worse, that’s not the argument in the post.
IMO the source of this apparent conflict is that we pretend that our values and beliefs are something different from our actual (unconscious) values and beliefs. The “conflict” is either just play-acting about how we take those pretense value seriously, or an attempt to justify the contradiction between stated and revealed preferences without giving up on the pretense.
Right, I think this is a pretty plausible hypothesis.
Here’s another perspective: Scott is writing the perspective of (something like) the memes, who exert some control but don’t have root access. The memes have a lot of control over when we feel good or bad about ourselves (this is a primary control mechanism they have). But the underlying biological organism has more control over what we actually do or don’t do.
The memes also don’t have a great deal of self-awareness of this split agency. They see themselves as the biological organism. So they’re actually a bit puzzled about why the organism doesn’t maximize the memetic values all the time.
One strategy which the memes use, in response to this situation, is to crank up the guilt-o-meter whenever actions don’t reflect explicitly endorsed values.
Scott and Nate are both arguing against this strategy. Scott’s SSC perspective is something like: “Don’t feel guilty all the time. You don’t have to go all the way with your principles. It’s OK to apply those principles selectively, so long as you make sure you’re not doing it in a biased way to get what you want.”
This is basically sympathetic to the “you should feel guilty if you do bad things” idea, but arguing about how to set the threshold.
Nate’s Minding Our Way perspective is instead: “Guilt isn’t an emotion that a unified agent would feel. So you must be a fractured agent. You’re at war with yourself; what you need is a peace treaty. Work to recognize your fractured architecture, and negotiate better and better treaties. After a while you’ll be acting like a unified agent.”
Just a note that these are based on the SlateStarCodexAbridged edition of SSC:
https://www.slatestarcodexabridged.com/
And just to clarify what that means, from their website:
I agree this is a common thread in Scott’s writing (though i bet I’ve read less than you did). As Tim Urban remarked recently, Scott is a master at conveying his confidence level in his writing. He knows both how to write with conviction when he’s very confident, and how to convey his uncertainty when he’s uncertain. It may come from confidence in his calibration about a claim instead of in the claim itself. It sounds much harder to write a post arguing that we should believe X with 80% confidence than just a post arguing that it’s true. And these are exactly the sort of posts Scott is exceptionally good at.
P.S: Cults are bad :)
Minding Our Way addresses this very phenomenon in Confidence All The Way Up. To my eye, Scott Alexander articulates his uncertainty with an air of meta-uncertainty; even when he sounds certain, he sounds tentatively uncertain. For example, his posts sometimes proceed in sections where each tells a strong story, but the next section contradicts the story, telling a new story from an opposite perspective. This gives a sense that no matter how strong an argument is, it could be knocked down by an even stronger argument which blindsides you. This kind of thing is actually another obsession of Scott’s (by my estimation).
In contrast, Nate Soares articulates his uncertainty with an air of meta-confidence; he’s uncertain, but he knows a lot about where that uncertainty comes from and what would change his mind. He can put numbers to it. If he’s not sure about what would change his mind, he can tell you about how he would figure it out. And so on.
Another Insanity Wolf meme!
I don’t agree. Or at least, I think there’s some level-crossing here of the axiology/morality/legality type (personally I’ve started to think of that as a 5 level distinction instead, axiology/metaethics/morality/cultural norms/legality). I see it as equivalent to saying you shouldn’t design an airplane using only quantum field theory. Not because it would be wrong, but because it would be intractable. We, as embodied beings in the world, may have principles we’re sure of—principles that would, if applied, accurately compare world states and trajectories. These principles may be computationally intractable given our limited minds, or may depend on information we can’t reliably obtain. So we make approximations, and try to apply them while remembering that they’re approximations and occasionally pausing when things look funny to see if the approximations are still working.
What would the principles we’re sure of be?
To clarify: I don’t think there are principles expressible in reasonable-length English sentences that we should be sure of. I actually think no such sentence can be “right” in the sense of conforming-to-what-we-actually-believe. But, I do think there is some set of underlying principles, instantiated in our minds, that we use in practice to decide what events or world states or approximate-and-expressible-principles are good or bad, or better or worse, and to what degree. I use my built-in “what’s good?” sense to judge the questions that get asked further down in the hierarchy of legibility.
So let’s call these the X-principles. You seem to say:
The X-principles are what we use in practice, to decide what events are good or bad.
It would be too hard to use the X-principles to entirely guide our decisions, in the same way that it would be too hard to use quantum mechanics to build airplanes.
We can be completely sure of the X-principles.
The X-principles are “instantiated in our minds”
I think there are some principles “instantiated in our minds” which we in practice behave as if we are sure of, IE, we simply do make decisions according to. Let’s call these the bio-principles. I don’t think we should be 100% sure of these principles (indeed, they are often wrong/suboptimal).
I think there are some principles we aspire to, which we are in the process of constructing throughout life (and also in conversation with a cross-generational project of humans articulating human values). Call these the CEV-principles; the reflectively consistent principles which we could arrive at eventually. These are “instantiated in our minds” in some weak sense, sort of like saying that a program which could crack cryptography given sufficient time “instantiates” the secret key which it would eventually find if you run it for long enough. But I think perhaps even worse than that, because some of the CEV-principles require interacting with other people and the wider world in order for us to find them.
I think saying that we can be completely sure of the CEV-principles is a map/territory error. We are currently uncertain about what these principles are. Even once we find them, we would probably still maintain some uncertainty about them.
Your X-principles sound somewhere between these two, and I’m not sure how to make sense of that.
With respect to my original point you were critiquing,
I would be happy to restrict the domain of this claim to principles which we can articulate. I was discussing bloggers like Scott Alexander, so I think the restriction makes sense.
So, for example, consider a utilitarian altruist who is very sure that utilitarian altruism is morally correct. They might not be sure why. They might have non-articulable intuitions which underly their beliefs. But they have some explicit beliefs which they are 99.9% confident in. These beliefs may lead to some morally counterintuitive conclusion, EG, that murder is correct when the benefits outweigh the costs.
So, what is my claim (the claim that you were disagreeing with) in this context?
Scott Alexander is saying something like: we can accept the premise (that utilitarianism is correct) from an intellectual standpoint, but yet, not go around murdering people when we think it is utilitarian-correct to do so. Scott thinks people should be consistent in how they apply principles, but, he doesn’t think the best way to be consistent is clearly “always apply principles you believe in”. He doesn’t want the utilitarian altruist to be eaten alive by their philosophy; he thinks giving 10% can be a pretty good solution.
Nate Soares is saying something like: if we’re only giving 10% to something we claim to be 99.9% sure of, we’re probably not as sure as we claim we are, or else we’re making a plain mistake.
(Keep in mind I’m using “nate” and “scott” here to point to a spectrum; not 100% talking about the real nate & scott.)
My claim is that Nate’s position is much less puzzling on classical decision-theoretic grounds. Beliefs are “for” decisionmaking. If you’re putting some insulation between your beliefs and your decisions, you’re probably acting on some hidden beliefs.
I have some sympathy with the Nate side. It feels a bit like the Scott position is doing separation of concerns wrong. If your beliefs and your actions disagree, I think it better to revise one or the other, rather than coming up with principles about how it’s fine to say one thing and do another. But I’m also not claiming I’m 100% on the Nate side of this spectrum. To say it is “puzzling from a decision-theoretic perspective” is not to say it is wrong. It might just as easily be a fault of classical decision theory, rather than a fault of Scott’s way of thinking. See, EG, geometric rationality.
Does this clarify my position? I’m curious what you still might disagree with, and for you to say more about the X-principles.
I agree with this.
I see it more as, not Scott, but human minds doing separation of concerns wrong. A well designed mind would probably work differently, plausibly more in line with decision theoretic assumptions, but you go to war with the army you have. What I have is a brain, coughed up by evolution, built from a few GB of source code, trained on a lifetime of highly redundant low-resolution sensory data, and running on a few tens of watts of sugar. How I should act is downstream of what I happen to be and what constraints I’m forced to optimize under.
I think the idea of distinguishing CEV-principles as a separate category is a good point. Suppose we follow the iterative-learning-over-a-lifetime to it’s logical endpoint, and assume an agent has crafted a fully-fleshed-out articulable set of principles that they endorse reflectively in 100% of cases. I agree this is possible and would be very excited to see the result. If I had it, what would this mean for my actions?
Well, what I ideally want is to take the actions that the CEV-principles say are optimal. But, I am an agent with limited data and finite compute, and I face the same kind of tradeoffs as an operating system deciding when (and for how long) to run its task scheduler. At one extreme, it never gets run, and whatever task comes along first gets run until it quits. At the other extreme, it runs indefinitely, and determines exactly what action would have been optimal, but not until long after the opportunity to use that result has passed. Both extremes are obviously terrible. In between are a global optimum and some number of local optima, but you only ever have estimates of how close you are to them, and estimates of how much it would cost (in compute or in data acquisition effort) to get better estimates.
Given that, what I can actually do is make a series of approximations that are more tractable and rapidly executable that are usually close to optimal in the conditions I usually need to apply them, knowing that those approximations are liable to break in extreme cases. I then deliberately avoid pushing those approximations too hard in ways I predict would Goodheart them in ways I have a hard time predicting. Even my CEV-principles would (I expect) endorse this, because they would necessarily contain terms for the cost of devoting more resources to making better decisions.
So, from my POV, I have an implicitly encoded seed for generating my CEV-principles, which I use to internalize and reflect on a set of meta-ethical general principles, which I use to generate a set of moral principles to guide actions. I share many of those (but not all) with my society, which also has informal norms and formal laws. Each step in that chain smooths out and approximates the potentially unboundedly complex edges of the underlying CEV-principles, in order to accommodate the limited compute budget allocated to judging individual cases.
I think one of the reasons for moral progress over time is that, as we become wealthier and get better available data, we can make and evaluate and act on less crude approximations, individually and societally. I suspect this is also a part of why smarter people are, on average, more trusting and prosocial (if the studies I’ve read about that say this are wrong, please let me know!).
This doesn’t mean no one should ever become a holy madman. It just means the bar for doing so should be set higher than a simple expected value calculation would suggest. Similarly, in business, sometimes the right move is to bet the company, and in war, sometimes the right move is one that risks the future of your civilization. But, the bar for doing either needs to be very high, much higher than just “This is the highest expected payoff move we can come up with.”
What do you mean by “better” here?
For humans (or any other kinds of agents) that live in the physical world as opposed to idealized mathematical universes, the process of explicitly revising beliefs (or the equivalent action-generators in the latter case) imposes costs in terms of the time and energy necessary to make the corrections. Since we are limited beings that often go awry because of biases, misconceptions etc, we would need to revise everything constantly and consequently spend a ton of time just ensuring that our (conscious, S2-endorsed beliefs) match our actions.
But if you try to function on the basis of these meta-principles that say “in one case, think about it this way; in another case, think about it this other way (which is actually deeply incompatible with the first one) etc,” you only need to pay the cost once: at the moment you find and commit to the meta-principles. Afterwards, you no longer need to worry about ensuring that everything is coherent and that the different mindsets you use are in alignment with one another; you just plop whatever situation you find yourself confronted with into the meta-principle machine and it spits out which mindset you should select.
So I can agree that revising one of your beliefs and actions to ensure that they agree generates an important benefit, but the more important question is whether that benefit overcomes the associated cost I just mentioned, given the fundamental and structural imperfections of the human mind. I suspect Scott thinks it does not: he would probably say it would be axiologically good if you could do so (the world-state in which you make your beliefs and actions coherent is “better” than the one in which you don’t, all else equal), but because all else is not equal in the reality we live our lives in, it would not be the best option to choose for virtually all humans.
(Upon reflection, it could be that what I am saying here is totally beside the point of your initial dialogue with Anthony, and I apologize if that’s the case)
Thanks for writing this out. I’m more sympathetic to Nate Soares view and wish more rationalists would take action on their beliefs and this is useful to point to the distinction that exists.