But this is really weird from a decision-theoretic perspective. An agent should be unsure of principles, not sure of principles but unsure about applying them.
I don’t agree. Or at least, I think there’s some level-crossing here of the axiology/morality/legality type (personally I’ve started to think of that as a 5 level distinction instead, axiology/metaethics/morality/cultural norms/legality). I see it as equivalent to saying you shouldn’t design an airplane using only quantum field theory. Not because it would be wrong, but because it would be intractable. We, as embodied beings in the world, may have principles we’re sure of—principles that would, if applied, accurately compare world states and trajectories. These principles may be computationally intractable given our limited minds, or may depend on information we can’t reliably obtain. So we make approximations, and try to apply them while remembering that they’re approximations and occasionally pausing when things look funny to see if the approximations are still working.
To clarify: I don’t think there are principles expressible in reasonable-length English sentences that we should be sure of. I actually think no such sentence can be “right” in the sense of conforming-to-what-we-actually-believe. But, I do think there is some set of underlying principles, instantiated in our minds, that we use in practice to decide what events or world states or approximate-and-expressible-principles are good or bad, or better or worse, and to what degree. I use my built-in “what’s good?” sense to judge the questions that get asked further down in the hierarchy of legibility.
So let’s call these the X-principles. You seem to say:
The X-principles are what we use in practice, to decide what events are good or bad.
It would be too hard to use the X-principles to entirely guide our decisions, in the same way that it would be too hard to use quantum mechanics to build airplanes.
We can be completely sure of the X-principles.
The X-principles are “instantiated in our minds”
I think there are some principles “instantiated in our minds” which we in practice behave as if we are sure of, IE, we simply do make decisions according to. Let’s call these the bio-principles. I don’t think we should be 100% sure of these principles (indeed, they are often wrong/suboptimal).
I think there are some principles we aspire to, which we are in the process of constructing throughout life (and also in conversation with a cross-generational project of humans articulating human values). Call these the CEV-principles; the reflectively consistent principles which we could arrive at eventually. These are “instantiated in our minds” in some weak sense, sort of like saying that a program which could crack cryptography given sufficient time “instantiates” the secret key which it would eventually find if you run it for long enough. But I think perhaps even worse than that, because some of the CEV-principles require interacting with other people and the wider world in order for us to find them.
I think saying that we can be completely sure of the CEV-principles is a map/territory error. We are currently uncertain about what these principles are. Even once we find them, we would probably still maintain some uncertainty about them.
Your X-principles sound somewhere between these two, and I’m not sure how to make sense of that.
With respect to my original point you were critiquing,
But this is really weird from a decision-theoretic perspective. An agent should be unsure of principles, not sure of principles but unsure about applying them.
I would be happy to restrict the domain of this claim to principles which we can articulate. I was discussing bloggers like Scott Alexander, so I think the restriction makes sense.
So, for example, consider a utilitarian altruist who is very sure that utilitarian altruism is morally correct. They might not be sure why. They might have non-articulable intuitions which underly their beliefs. But they have some explicit beliefs which they are 99.9% confident in. These beliefs may lead to some morally counterintuitive conclusion, EG, that murder is correct when the benefits outweigh the costs.
So, what is my claim (the claim that you were disagreeing with) in this context?
Scott Alexander is saying something like: we can accept the premise (that utilitarianism is correct) from an intellectual standpoint, but yet, not go around murdering people when we think it is utilitarian-correct to do so. Scott thinks people should be consistent in how they apply principles, but, he doesn’t think the best way to be consistent is clearly “always apply principles you believe in”. He doesn’t want the utilitarian altruist to be eaten alive by their philosophy; he thinks giving 10% can be a pretty good solution.
Nate Soares is saying something like: if we’re only giving 10% to something we claim to be 99.9% sure of, we’re probably not as sure as we claim we are, or else we’re making a plain mistake.
(Keep in mind I’m using “nate” and “scott” here to point to a spectrum; not 100% talking about the real nate & scott.)
My claim is that Nate’s position is much less puzzling on classical decision-theoretic grounds. Beliefs are “for” decisionmaking. If you’re putting some insulation between your beliefs and your decisions, you’re probably acting on some hidden beliefs.
I have some sympathy with the Nate side. It feels a bit like the Scott position is doing separation of concerns wrong. If your beliefs and your actions disagree, I think it better to revise one or the other, rather than coming up with principles about how it’s fine to say one thing and do another. But I’m also not claiming I’m 100% on the Nate side of this spectrum. To say it is “puzzling from a decision-theoretic perspective” is not to say it is wrong. It might just as easily be a fault of classical decision theory, rather than a fault of Scott’s way of thinking. See, EG, geometric rationality.
Does this clarify my position? I’m curious what you still might disagree with, and for you to say more about the X-principles.
My claim is that Nate’s position is much less puzzling on classical decision-theoretic grounds. Beliefs are “for” decisionmaking. If you’re putting some insulation between your beliefs and your decisions, you’re probably acting on some hidden beliefs.
I agree with this.
It feels a bit like the Scott position is doing separation of concerns wrong. If your beliefs and your actions disagree, I think it better to revise one or the other, rather than coming up with principles about how it’s fine to say one thing and do another.
I see it more as, not Scott, but human minds doing separation of concerns wrong. A well designed mind would probably work differently, plausibly more in line with decision theoretic assumptions, but you go to war with the army you have. What I have is a brain, coughed up by evolution, built from a few GB of source code, trained on a lifetime of highly redundant low-resolution sensory data, and running on a few tens of watts of sugar. How I should act is downstream of what I happen to be and what constraints I’m forced to optimize under.
I think the idea of distinguishing CEV-principles as a separate category is a good point. Suppose we follow the iterative-learning-over-a-lifetime to it’s logical endpoint, and assume an agent has crafted a fully-fleshed-out articulable set of principles that they endorse reflectively in 100% of cases. I agree this is possible and would be very excited to see the result. If I had it, what would this mean for my actions?
Well, what I ideallywant is to take the actions that the CEV-principles say are optimal. But, I am an agent with limited data and finite compute, and I face the same kind of tradeoffs as an operating system deciding when (and for how long) to run its task scheduler. At one extreme, it never gets run, and whatever task comes along first gets run until it quits. At the other extreme, it runs indefinitely, and determines exactly what action would have been optimal, but not until long after the opportunity to use that result has passed. Both extremes are obviously terrible. In between are a global optimum and some number of local optima, but you only ever have estimates of how close you are to them, and estimates of how much it would cost (in compute or in data acquisition effort) to get better estimates.
Given that, what I can actually do is make a series of approximations that are more tractable and rapidly executable that are usually close to optimal in the conditions I usually need to apply them, knowing that those approximations are liable to break in extreme cases. I then deliberately avoid pushing those approximations too hard in ways I predict would Goodheart them in ways I have a hard time predicting. Even my CEV-principles would (I expect) endorse this, because they would necessarily contain terms for the cost of devoting more resources to making better decisions.
So, from my POV, I have an implicitly encoded seed for generating my CEV-principles, which I use to internalize and reflect on a set of meta-ethical general principles, which I use to generate a set of moral principles to guide actions. I share many of those (but not all) with my society, which also has informal norms and formal laws. Each step in that chain smooths out and approximates the potentially unboundedly complex edges of the underlying CEV-principles, in order to accommodate the limited compute budget allocated to judging individual cases.
I think one of the reasons for moral progress over time is that, as we become wealthier and get better available data, we can make and evaluate and act on less crude approximations, individually and societally. I suspect this is also a part of why smarter people are, on average, more trusting and prosocial (if the studies I’ve read about that say this are wrong, please let me know!).
This doesn’t mean no one should ever become a holy madman. It just means the bar for doing so should be set higher than a simple expected value calculation would suggest. Similarly, in business, sometimes the right move is to bet the company, and in war, sometimes the right move is one that risks the future of your civilization. But, the bar for doing either needs to be very high, much higher than just “This is the highest expected payoff move we can come up with.”
I have some sympathy with the Nate side. It feels a bit like the Scott position is doing separation of concerns wrong. If your beliefs and your actions disagree, I think it better to revise one or the other, rather than coming up with principles about how it’s fine to say one thing and do another.
What do you mean by “better” here?
For humans (or any other kinds of agents) that live in the physical world as opposed to idealized mathematical universes, the process of explicitly revising beliefs (or the equivalent action-generators in the latter case) imposes costs in terms of the time and energy necessary to make the corrections. Since we are limited beings that often go awry because of biases, misconceptions etc, we would need to revise everything constantly and consequently spend a ton of time just ensuring that our (conscious, S2-endorsed beliefs) match our actions.
But if you try to function on the basis of these meta-principles that say “in one case, think about it this way; in another case, think about it this other way (which is actually deeply incompatible with the first one) etc,” you only need to pay the cost once: at the moment you find and commit to the meta-principles. Afterwards, you no longer need to worry about ensuring that everything is coherent and that the different mindsets you use are in alignment with one another; you just plop whatever situation you find yourself confronted with into the meta-principle machine and it spits out which mindset you should select.
So I can agree that revising one of your beliefs and actions to ensure that they agree generates an important benefit, but the more important question is whether that benefit overcomes the associated cost I just mentioned, given the fundamental and structural imperfections of the human mind. I suspect Scott thinks it does not: he would probably say it would be axiologically good if you could do so (the world-state in which you make your beliefs and actions coherent is “better” than the one in which you don’t, all else equal), but because all else is not equal in the reality we live our lives in, it would not be the best option to choose for virtually all humans.
(Upon reflection, it could be that what I am saying here is totally beside the point of your initial dialogue with Anthony, and I apologize if that’s the case)
I don’t agree. Or at least, I think there’s some level-crossing here of the axiology/morality/legality type (personally I’ve started to think of that as a 5 level distinction instead, axiology/metaethics/morality/cultural norms/legality). I see it as equivalent to saying you shouldn’t design an airplane using only quantum field theory. Not because it would be wrong, but because it would be intractable. We, as embodied beings in the world, may have principles we’re sure of—principles that would, if applied, accurately compare world states and trajectories. These principles may be computationally intractable given our limited minds, or may depend on information we can’t reliably obtain. So we make approximations, and try to apply them while remembering that they’re approximations and occasionally pausing when things look funny to see if the approximations are still working.
What would the principles we’re sure of be?
To clarify: I don’t think there are principles expressible in reasonable-length English sentences that we should be sure of. I actually think no such sentence can be “right” in the sense of conforming-to-what-we-actually-believe. But, I do think there is some set of underlying principles, instantiated in our minds, that we use in practice to decide what events or world states or approximate-and-expressible-principles are good or bad, or better or worse, and to what degree. I use my built-in “what’s good?” sense to judge the questions that get asked further down in the hierarchy of legibility.
So let’s call these the X-principles. You seem to say:
The X-principles are what we use in practice, to decide what events are good or bad.
It would be too hard to use the X-principles to entirely guide our decisions, in the same way that it would be too hard to use quantum mechanics to build airplanes.
We can be completely sure of the X-principles.
The X-principles are “instantiated in our minds”
I think there are some principles “instantiated in our minds” which we in practice behave as if we are sure of, IE, we simply do make decisions according to. Let’s call these the bio-principles. I don’t think we should be 100% sure of these principles (indeed, they are often wrong/suboptimal).
I think there are some principles we aspire to, which we are in the process of constructing throughout life (and also in conversation with a cross-generational project of humans articulating human values). Call these the CEV-principles; the reflectively consistent principles which we could arrive at eventually. These are “instantiated in our minds” in some weak sense, sort of like saying that a program which could crack cryptography given sufficient time “instantiates” the secret key which it would eventually find if you run it for long enough. But I think perhaps even worse than that, because some of the CEV-principles require interacting with other people and the wider world in order for us to find them.
I think saying that we can be completely sure of the CEV-principles is a map/territory error. We are currently uncertain about what these principles are. Even once we find them, we would probably still maintain some uncertainty about them.
Your X-principles sound somewhere between these two, and I’m not sure how to make sense of that.
With respect to my original point you were critiquing,
I would be happy to restrict the domain of this claim to principles which we can articulate. I was discussing bloggers like Scott Alexander, so I think the restriction makes sense.
So, for example, consider a utilitarian altruist who is very sure that utilitarian altruism is morally correct. They might not be sure why. They might have non-articulable intuitions which underly their beliefs. But they have some explicit beliefs which they are 99.9% confident in. These beliefs may lead to some morally counterintuitive conclusion, EG, that murder is correct when the benefits outweigh the costs.
So, what is my claim (the claim that you were disagreeing with) in this context?
Scott Alexander is saying something like: we can accept the premise (that utilitarianism is correct) from an intellectual standpoint, but yet, not go around murdering people when we think it is utilitarian-correct to do so. Scott thinks people should be consistent in how they apply principles, but, he doesn’t think the best way to be consistent is clearly “always apply principles you believe in”. He doesn’t want the utilitarian altruist to be eaten alive by their philosophy; he thinks giving 10% can be a pretty good solution.
Nate Soares is saying something like: if we’re only giving 10% to something we claim to be 99.9% sure of, we’re probably not as sure as we claim we are, or else we’re making a plain mistake.
(Keep in mind I’m using “nate” and “scott” here to point to a spectrum; not 100% talking about the real nate & scott.)
My claim is that Nate’s position is much less puzzling on classical decision-theoretic grounds. Beliefs are “for” decisionmaking. If you’re putting some insulation between your beliefs and your decisions, you’re probably acting on some hidden beliefs.
I have some sympathy with the Nate side. It feels a bit like the Scott position is doing separation of concerns wrong. If your beliefs and your actions disagree, I think it better to revise one or the other, rather than coming up with principles about how it’s fine to say one thing and do another. But I’m also not claiming I’m 100% on the Nate side of this spectrum. To say it is “puzzling from a decision-theoretic perspective” is not to say it is wrong. It might just as easily be a fault of classical decision theory, rather than a fault of Scott’s way of thinking. See, EG, geometric rationality.
Does this clarify my position? I’m curious what you still might disagree with, and for you to say more about the X-principles.
I agree with this.
I see it more as, not Scott, but human minds doing separation of concerns wrong. A well designed mind would probably work differently, plausibly more in line with decision theoretic assumptions, but you go to war with the army you have. What I have is a brain, coughed up by evolution, built from a few GB of source code, trained on a lifetime of highly redundant low-resolution sensory data, and running on a few tens of watts of sugar. How I should act is downstream of what I happen to be and what constraints I’m forced to optimize under.
I think the idea of distinguishing CEV-principles as a separate category is a good point. Suppose we follow the iterative-learning-over-a-lifetime to it’s logical endpoint, and assume an agent has crafted a fully-fleshed-out articulable set of principles that they endorse reflectively in 100% of cases. I agree this is possible and would be very excited to see the result. If I had it, what would this mean for my actions?
Well, what I ideally want is to take the actions that the CEV-principles say are optimal. But, I am an agent with limited data and finite compute, and I face the same kind of tradeoffs as an operating system deciding when (and for how long) to run its task scheduler. At one extreme, it never gets run, and whatever task comes along first gets run until it quits. At the other extreme, it runs indefinitely, and determines exactly what action would have been optimal, but not until long after the opportunity to use that result has passed. Both extremes are obviously terrible. In between are a global optimum and some number of local optima, but you only ever have estimates of how close you are to them, and estimates of how much it would cost (in compute or in data acquisition effort) to get better estimates.
Given that, what I can actually do is make a series of approximations that are more tractable and rapidly executable that are usually close to optimal in the conditions I usually need to apply them, knowing that those approximations are liable to break in extreme cases. I then deliberately avoid pushing those approximations too hard in ways I predict would Goodheart them in ways I have a hard time predicting. Even my CEV-principles would (I expect) endorse this, because they would necessarily contain terms for the cost of devoting more resources to making better decisions.
So, from my POV, I have an implicitly encoded seed for generating my CEV-principles, which I use to internalize and reflect on a set of meta-ethical general principles, which I use to generate a set of moral principles to guide actions. I share many of those (but not all) with my society, which also has informal norms and formal laws. Each step in that chain smooths out and approximates the potentially unboundedly complex edges of the underlying CEV-principles, in order to accommodate the limited compute budget allocated to judging individual cases.
I think one of the reasons for moral progress over time is that, as we become wealthier and get better available data, we can make and evaluate and act on less crude approximations, individually and societally. I suspect this is also a part of why smarter people are, on average, more trusting and prosocial (if the studies I’ve read about that say this are wrong, please let me know!).
This doesn’t mean no one should ever become a holy madman. It just means the bar for doing so should be set higher than a simple expected value calculation would suggest. Similarly, in business, sometimes the right move is to bet the company, and in war, sometimes the right move is one that risks the future of your civilization. But, the bar for doing either needs to be very high, much higher than just “This is the highest expected payoff move we can come up with.”
What do you mean by “better” here?
For humans (or any other kinds of agents) that live in the physical world as opposed to idealized mathematical universes, the process of explicitly revising beliefs (or the equivalent action-generators in the latter case) imposes costs in terms of the time and energy necessary to make the corrections. Since we are limited beings that often go awry because of biases, misconceptions etc, we would need to revise everything constantly and consequently spend a ton of time just ensuring that our (conscious, S2-endorsed beliefs) match our actions.
But if you try to function on the basis of these meta-principles that say “in one case, think about it this way; in another case, think about it this other way (which is actually deeply incompatible with the first one) etc,” you only need to pay the cost once: at the moment you find and commit to the meta-principles. Afterwards, you no longer need to worry about ensuring that everything is coherent and that the different mindsets you use are in alignment with one another; you just plop whatever situation you find yourself confronted with into the meta-principle machine and it spits out which mindset you should select.
So I can agree that revising one of your beliefs and actions to ensure that they agree generates an important benefit, but the more important question is whether that benefit overcomes the associated cost I just mentioned, given the fundamental and structural imperfections of the human mind. I suspect Scott thinks it does not: he would probably say it would be axiologically good if you could do so (the world-state in which you make your beliefs and actions coherent is “better” than the one in which you don’t, all else equal), but because all else is not equal in the reality we live our lives in, it would not be the best option to choose for virtually all humans.
(Upon reflection, it could be that what I am saying here is totally beside the point of your initial dialogue with Anthony, and I apologize if that’s the case)