I’m not saying that people “should try” to use their beliefs to model and act in reality.
I’m saying that some people’s minds are set up such that stated beliefs are by default reports about a set of structurally integrated (and therefore logically consistent) constraints on their anticipations. Others’ minds seem to be concerned with making socially desirable assertions, where apparent consistency is a desideratum. The first group is going to have no trouble at all “acting in accordance with [their] stated beliefs about the world” so long as they didn’t lie when they stated their beliefs, and the sort of accountability you’re talking about seems a bit silly. The second group is going to have a great deal of trouble, and accountability will at best cause them to perform consistency when others are watching, not to take initiative based on their beliefs. (Cf. Guess culture screens for trying to cooperate.)
The first group is going to have no trouble at all “acting in accordance with [their] stated beliefs about the world” so long as they didn’t lie when they stated their beliefs
This seems weakly plausible but unlikely to me. By computational limitations it seems basically impossible to act in accordance with all of your stated beliefs, and there will always be some level of contradiction between your beliefs, as well as your correspondence of beliefs to actions. Figuring out how to leverage more than your own brain to notice inconsistencies in your actions and beliefs seems like a desirable goal for basically everyone.
And not only that, your internal beliefs are usually far too complicated to easily communicate to someone else. So even if you have internally consistent beliefs, it’s a major challenge to communicate them in a way that allows other people to understand your consistency. This is why there is a tension between accuracy and transparency here (and hence the tradeoff I am pointing to in your choice of group of accountability).
To maybe make it more clear, accountability has two major benefits, both of which seem highly desirable to basically every person:
Allow other people to help you notice contradictions in your beliefs and in the correspondence of your beliefs to your actions.
Allow others to predict your future actions, and to engage in contracts and positive sum trade with you and with that to coordinate on future actions
To maybe respond more concretely to your top-level comment, which I think I now understand better, I do think that a more continuous model is accurate here, though I share at least a bit of your sense (or at least what I perceive to be your sense) of there being some discrete shift between the two different modes of thinking.
I do however think that people can change what their primary mode of thinking is (at least over the course of years), and also think that for most people (and definitely for me) there is often an unendorsed temptation to use the profession of beliefs as speech acts and not as reporting of anticipated constraints of my future observations and I benefit a lot from being in an environment in which I am rewarded for doing the latter and not the former.
This exchange has given me the feeling of pushing on a string, so instead of pretending that I feel like engaging on the object level will be productive, I’m going to try to explain why I don’t feel that way.
It seems to me like you’re trying to find an angle where our disagreement disappears. This is useful for papering over disagreements or pushing them off, which can be valuable when that reallocates attention from zero-sum conflict to shared production or trade relations. But that’s not the sort of thing I’d hope for on a rationalist forum. What I’d expect there is something more like double-cruxing, trying to find the angle at which our core disagreement becomes most visible and salient.
Sentences like this seem like a strong tell to me:
I do think that a more continuous model is accurate here, though I share at least a bit of your sense (or at least what I perceive to be your sense) of there being some discrete shift between the two different modes of thinking.
While “I think you’re partly wrong, but also partly right” is a position I often hold about someone I’m arguing with, it doesn’t clarify things any more than “let’s agree to disagree.” It can set the frame for a specific effort to articulate what exactly I think is wrong under what circumstances. What I would have hoped to see from you would have been more like:
If you don’t see why I care about pointing out this distinction, you could just ask me why you should care.
If you think you know why I care but disagree, you could explain what you think I’m missing.
If you’re unsure whether you have a good sense of the disagreement, you could try explaining how you think our points of view differ.
Thanks for popping up a meta-level. Seems reasonable in this circumstance.
I agree with you that that one paragraph is mostly doing the “I think you’re partly wrong, but also partly right” thing, but the rest of my comment doesn’t really do that, so I am a bit sad/annoyed that you perceived that to be my primary intention (or at least that’s what I read into your above comment).
I also think that paragraph is doing some other important work that isn’t only about the “let’s avoid a zero-sum conflict situation”, but I don’t really want to go into that too much, since I expect it to be less valuable than the other conversations we could be having.
The rest of my comment is pointing out some relatively concrete ways that make me doubt the things that you are saying. I have a model in my head of where you are coming from, and can see how that contradicts with other parts of reality that seem a lot more robust than the justifications that I think underlie your model.
I don’t yet have a sense that you see those parts of reality that make me think that your models are unlikely to be correct, and so I was trying primarily to point them out to you, and then for you to either produce a response of how you have actually integrated them, or for you to change your mind.
I think this mostly overlaps with your second suggested frame, so I guess we can just continue from there. I think I know why you care, and can probably give at least an approximate model of where you are coming from. I tried to explain what I think you are missing, which was concretely the concerns around bounded computation and the relatively universal need for people to coordinate with other people, which seem to me to contradict some of the things you are saying.
Also happy to give a summary of where I think you are coming from, and what my best guess of your current model is. While I see some contradictions in your model (or my best guess of it), it does seem actually important to point out that I’ve found value in thinking about it and am interested in seeing it fleshed out further (and am as such interested in continuing this conversation).
This could either happen in the form of...
you trying to more explicitly summarize what you think my current model is missing (probably by summarizing my model first),
or by me summarizing your model and asking some clarifying question,
or by you responding to my concrete objections in an analytic way,
or by me responding to your latest comment (though I don’t really know how to do that, since something about the expected frame of that reply feels off)
I don’t really have any super strong preference for any of these, but will likely not respond for a day. After that, I will try summarizing your perspective a bit more explicitly and then either ask some followup questions or point out the contradictions I currently see in it more explicitly.
I don’t understand the relevance of your responses to my stated model. I’d like it if you tried to explain why your responses are relevant, in a way that characterizes what you think I’m saying more explicitly.
My other most recent comment tries to show what your perspective looks like to me, and what I think it’s missing.
I think this is the most helpful encapsulation I’ve gotten of your preferred meta-frame.
I think I mostly just agree with it now that it’s spelled out a better. (I think I have some disagreements about how exactly rationalist forums should relate to this, and what moods are useful. But in this case I basically agree that the actions you suggest at the end are the right move and it seems better to focus on that).
This seems like a proposal to use the same kinds of postural adjustments on a group that includes anatomically complete human beings, and lumps of clay. Even if there’s a continuum between the two, if what you want to produce is the former, adjustments that work for the latter are going to be a bad idea.
If someone’s inconsistencies are due to an internal confusion about what’s true, that’s a different situation requiring a different kind of response from the situation in which those inconsistencies are due to occasionally lying when they have an incentive to avoid disclosing their true belief structure. Both are different from one in which there simply isn’t an approximately coherent belief structure to be represented.
Can’t answer for habryka, but my current guess of where you’re point at here is something like: “the sort of drive towards consistency is part of an overall pattern that seems net harmful, and that the correct action is more like stopping and thinking than like ‘trying to do better at what you were currently doing’.”
(You haven’t yet told me if this comment was successfully passing your ITT, but it’s my working model of your frame)
I think habryka (and separately, but not coincidentally, me) has a belief that he’s the sort of person where looking for opportunities to improve consistency is beneficial. I’m not sure whether you’re disagreeing with that, or if you’re point is more that the median-lesswrong will be taking wrong advice from this?
[Assuming I’ve got your frame right, I obviously disagree quite a bit – but I’m not sure what to do about it locally, here]
Thanks for checking—I’m trying to say something pretty different.
It seems like the frame of the OP is lumping together the kind of consistency that comes from using the native architecture to model the deep structure of reality (see also Geometers, Scribes, and the structure of intelligence), and the kind of consistency that comes from trying to perform a guaranteed level of service for an outside party (see also Unreal’s idea of Dependability), and an important special case of the latter is rule-following as a form of submission or blame-avoidance. These are very different mental structures, respond very differently to incentives, and learn very different things from criticism. (Nightmare of the Perfectly Principled is my most direct attempt to point to this distinction.)
People who are trying to submit or avoid blame will try to alleviate the pressure of criticism with minimal effort, in ways that aren’t connected to their other beliefs. On the other hand, people with structured models will sometimes leapfrog past the critic, or jump in another direction entirely, as Benito pointed out in A Sketch of Good Communication.
If we don’t distinguish between these cases, then attempts to reason about the “optimal” attitude towards integrity or accountability will end up a lumpy, unsatisfactory linear compromise between the following policy goals:
Helping people with structurally integrated models notice tensions in their models that they can learn from.
Distinguishing people with structurally integrated models from those who (at least in the relevant domain) are mostly just trying not to stick out as wrong, so we can stop listening to the second group.
Establishing and enforcing the norms needed to coordinate actions among equals (e.g. shared expectations about promises).
Compelling a complicated performance from inferiors, or avoiding punishment by superiors trying to compel a complicated performance from you.
Converting people without structurally integrated models into people with structurally integrated models (or vice versa).
Depending on what problem you’re trying to solve, habryka’s statement that “if someone changes their stated principles in an unpredictable fashion every day (or every hour), then I think most of the benefits of openly stating your principles disappear” can be almost exactly backwards.
If your principles predictably change based on your circumstances, that’s reasonably likely to be a kind of adversarial optimization similar to A/B testing of communication. They don’t mean their literal content, at least.
But there’s plenty of point in principles consistent with learning new things fast. In that case, change represents noise, which is costly, but much less costly than messaging optimized for extraction. And of course changing principles doesn’t need to imply a change in behavior to match—your new principles can and should take into account the fact that people may have committed resources based on your old stated principles.
In summary, my objection is that habryka seems to be thinking of beliefs as a special case of promises, while I think that if we’re trying to succeed based on epistemic rationality, we should be modeling promises as a special case of beliefs. For more detail on that, see Bindings and Assurances.
Agree strongly with this decomposition of integrity. They’re definitely different (although correlated) things.
My biggest disagreement with this model is that the first form (structurally integrated models) seems to me to be something broader? Something like, you have structurally integrated models of how things work and what matters to you, and take the actions suggested by the models to achieve what matters to you based on how things work?
Need to think through this in more detail. One can have what one might call integrity of thought without what one might call integrity of action based on that thought—you have the models, but others/you can’t count on you to act on them. And you can have integrity of action without integrity of thought, in the sense that you can be counted on to perform certain actions in certain circumstances, without integrity of thought, in which case you’ll do them whether or not it makes any sense, but you can at least be counted on. Or you can have both.
And I agree you have to split integrity of action into keeping promises when you make them slash following one’s own code, and keeping to the rules of the system slash following others’ codes, especially codes that determine what is blameworthy. To me, that third special case isn’t integrity. It’s often a good thing, but it’s a different thing—it counts as integrity if and only if one is following those rules because of one’s own code saying one should follow the outside code. We can debate under what circumstances that is or isn’t the right code, and should.
So I think for now I have it as Integrity-1 (Integrity of Thought) and Integrity-2 (Integrity of Action), and a kind of False-Integrity-3 (Integrity of Blamelessness) that is worth having a name for, and tracking who has and doesn’t have it in what circumstances to what extent, like the other two, but isn’t obviously something it’s better to increase than decrease by default. Whereas Integrity-1 is by default to be increased, as is Integrity-2, and if you disagree with that, this implies to me there’s a conflict causing you to want others to be less effective, or you’re otherwise trying to do extraction or be zero sum.
It seems to me that integrity of thought is actually quite a lot easier if it constrains the kind of anticipations that authentically and intuitively affect actions. Actions can still diverge from beliefs if someone with integrity of thought gets distracted enough to drop into a stereotyped habit (e.g. if I’m a bit checked out while driving and end up at a location I’m used to going to instead of the one I need to be at) or is motivated to deceive (e.g. corvids that think carefully about how to hide their food from other corvids).
The kind of belief-action split we’re used to seeing, I think, involves a school-broken sort of “believing” that’s integrated with the structures that are needed to give coherent answers on tests, but severed from thinking about one’s actual environment and interests.
The most important thing I did for my health in the last few years was healing this split.
Something I still can’t tell about your concern, though: one of the things that seemed like the “primary takeaway” here, at least to me, is the concept of thinking carefully about who you want to be accountable to (and to be wary of holding yourself accountable to too many people that won’t be able to understand more complex moral positions you might hold)
So far, having thought through that concept through the various lenses of integrity you list here, it doesn’t seem like something that likely to make things worse. Do you think it is?
(the other claim, to think about what sort of incentives you want for yourself, does seem like the sort of thing some people might interpret as instruction to create coercive environments for themselves. This was fairly different from what I think habryka was aiming at, but I’d agree that might not be clear from this post)
I don’t think I find that objectionable, it didn’t seem particularly interesting as a claim. It’s as old as “you can only serve one master,” god vs mammon, etc etc—you can’t do well at accountability to mutually incompatible standards. I think it depends a lot on the type and scope of accountability, though.
If the takeaway were what mattered about the post, why include all the other stuff?
If the takeaway were what mattered about the post, why include all the other stuff?
I think habryka was trying to get across a more cohesive worldview rather than just a few points. I also don’t know that my interpretation is the same as his. But here are some points that I took from this. (Hmm. These may not have been even slightly obvious in the current post, but were part of the background conversation that prompted it, and would probably have eventually been brought up in a future post. And I think the OP at least hints at them)
On Accountability
First, I think there are people in the LessWrong readership who still have some naive conception of “be accountable to the public”, which is in fact a recipe for
It’s as old as “you can only serve one master,”
This is pretty different from how I’d describe this.
In some sense, you only get to serve one master. But value is complex and fragile. So there many be many facets of integrity or morality that you find important to pay attention to, and you might be missing some of them. Your master may contain multitudes.
Any given facet of morality (or more generally, things I care about it) is complicated. I might want, to hold myself to a higher standard that I currently meet, is to have several people whom I hold myself accountable to, each of which pays deep attention to a different facet.
If I’m running a company, I might want to be accountable to
people who deeply understand the industry I’m working in
people who deeply understand human needs, coercion, etc, who can tell me if I’m mistreating my workers,
people who understand how my industry interacts with other industries, the general populace, or the environment, who can call me out if I’m letting negative externalities run wild.
[edit] maybe just having a kinda regular person who just sanity checks if I seem crazy
I might want multiple people for each facet, who look at that facet through a different lens.
By having these facets represented in concrete individuals, I also improve my ability to resolve confusions about how to trade off multiple sacred values. Each individual might deeply understand their domain and see it as most important. But if they disagree, they can doublecrux with each other, or with me, and I can try to integrate their views into something coherent and actionable.
There’s also the important operationalization of “what does accountable mean?” There’s different powers you could give these people, possibly including:
Emergency Doublecrux button – they can demand N hours of your time per year, at least forcing you to have a conversation to justify yourself
Vote of No Confidence – i.e. if your project still seems good but you seem corrupt, they can fire you and replace you
Shut down your project, if it seems net negative
There might be some people you trust with some powers but not others (i.e. you might think someone has good perspectives that justify the emergency double crux button but not the “fire you” button)
There’s a somewhat different conception you could have of all this that’s more coalition focused than personal development focused.
I feel like all of this mixes together info sources and incentives, so it feels a bit wrong to say I agree, but also feels a bit wrong to say I disagree.
I agree that there’s a better, crisper version of this that has those more distinct.
I’m not sure if the end product, for most people, should keep them distinct because by default humans seem to use blurry clusters of concepts to simplify things into something manageable.
But, I think if you’re aiming to be a robust agent, or build a robustly agentic organization, there’s is something valuable about keeping these crisply separate so you can reason about them well. (you’ve previously mentioned that this is analogous to the friendly AI problem and I agree). I think it’s a good project for many people in the rationalsphere to have undertaken to deepen our understanding, even if it turns out not to be practical for the average person.
The “different masters” thing is a special case of the problem of accepting feedback (i.e. learning from approval/disapproval or reward/punishment) from approval functions in conflict with each other or your goals. Multiple humans trying to do the same or compatible things with you aren’t “different masters” in this sense, since the same logical-decision-theoretic perspective (with some noise) is instantiated on both.
But also, there’s all sorts of gathering data from others’ judgment that doesn’t fit the accountability/commitment paradigm.
To give a concrete example, I expect math prodigies to have the easiest time solving any given math problem, but even so, I don’t expect that a system that punishes the students who don’t complete their assignments correctly will serve the math prodigies well. This, even if under other, totally different circumstances it’s completely appropriate to compel performance of arbitrary assignments through the threat of punishment.
I’m not saying that people “should try” to use their beliefs to model and act in reality.
I’m saying that some people’s minds are set up such that stated beliefs are by default reports about a set of structurally integrated (and therefore logically consistent) constraints on their anticipations. Others’ minds seem to be concerned with making socially desirable assertions, where apparent consistency is a desideratum. The first group is going to have no trouble at all “acting in accordance with [their] stated beliefs about the world” so long as they didn’t lie when they stated their beliefs, and the sort of accountability you’re talking about seems a bit silly. The second group is going to have a great deal of trouble, and accountability will at best cause them to perform consistency when others are watching, not to take initiative based on their beliefs. (Cf. Guess culture screens for trying to cooperate.)
This seems weakly plausible but unlikely to me. By computational limitations it seems basically impossible to act in accordance with all of your stated beliefs, and there will always be some level of contradiction between your beliefs, as well as your correspondence of beliefs to actions. Figuring out how to leverage more than your own brain to notice inconsistencies in your actions and beliefs seems like a desirable goal for basically everyone.
And not only that, your internal beliefs are usually far too complicated to easily communicate to someone else. So even if you have internally consistent beliefs, it’s a major challenge to communicate them in a way that allows other people to understand your consistency. This is why there is a tension between accuracy and transparency here (and hence the tradeoff I am pointing to in your choice of group of accountability).
To maybe make it more clear, accountability has two major benefits, both of which seem highly desirable to basically every person:
Allow other people to help you notice contradictions in your beliefs and in the correspondence of your beliefs to your actions.
Allow others to predict your future actions, and to engage in contracts and positive sum trade with you and with that to coordinate on future actions
To maybe respond more concretely to your top-level comment, which I think I now understand better, I do think that a more continuous model is accurate here, though I share at least a bit of your sense (or at least what I perceive to be your sense) of there being some discrete shift between the two different modes of thinking.
I do however think that people can change what their primary mode of thinking is (at least over the course of years), and also think that for most people (and definitely for me) there is often an unendorsed temptation to use the profession of beliefs as speech acts and not as reporting of anticipated constraints of my future observations and I benefit a lot from being in an environment in which I am rewarded for doing the latter and not the former.
This exchange has given me the feeling of pushing on a string, so instead of pretending that I feel like engaging on the object level will be productive, I’m going to try to explain why I don’t feel that way.
It seems to me like you’re trying to find an angle where our disagreement disappears. This is useful for papering over disagreements or pushing them off, which can be valuable when that reallocates attention from zero-sum conflict to shared production or trade relations. But that’s not the sort of thing I’d hope for on a rationalist forum. What I’d expect there is something more like double-cruxing, trying to find the angle at which our core disagreement becomes most visible and salient.
Sentences like this seem like a strong tell to me:
While “I think you’re partly wrong, but also partly right” is a position I often hold about someone I’m arguing with, it doesn’t clarify things any more than “let’s agree to disagree.” It can set the frame for a specific effort to articulate what exactly I think is wrong under what circumstances. What I would have hoped to see from you would have been more like:
If you don’t see why I care about pointing out this distinction, you could just ask me why you should care.
If you think you know why I care but disagree, you could explain what you think I’m missing.
If you’re unsure whether you have a good sense of the disagreement, you could try explaining how you think our points of view differ.
Thanks for popping up a meta-level. Seems reasonable in this circumstance.
I agree with you that that one paragraph is mostly doing the “I think you’re partly wrong, but also partly right” thing, but the rest of my comment doesn’t really do that, so I am a bit sad/annoyed that you perceived that to be my primary intention (or at least that’s what I read into your above comment).
I also think that paragraph is doing some other important work that isn’t only about the “let’s avoid a zero-sum conflict situation”, but I don’t really want to go into that too much, since I expect it to be less valuable than the other conversations we could be having.
The rest of my comment is pointing out some relatively concrete ways that make me doubt the things that you are saying. I have a model in my head of where you are coming from, and can see how that contradicts with other parts of reality that seem a lot more robust than the justifications that I think underlie your model.
I don’t yet have a sense that you see those parts of reality that make me think that your models are unlikely to be correct, and so I was trying primarily to point them out to you, and then for you to either produce a response of how you have actually integrated them, or for you to change your mind.
I think this mostly overlaps with your second suggested frame, so I guess we can just continue from there. I think I know why you care, and can probably give at least an approximate model of where you are coming from. I tried to explain what I think you are missing, which was concretely the concerns around bounded computation and the relatively universal need for people to coordinate with other people, which seem to me to contradict some of the things you are saying.
Also happy to give a summary of where I think you are coming from, and what my best guess of your current model is. While I see some contradictions in your model (or my best guess of it), it does seem actually important to point out that I’ve found value in thinking about it and am interested in seeing it fleshed out further (and am as such interested in continuing this conversation).
This could either happen in the form of...
you trying to more explicitly summarize what you think my current model is missing (probably by summarizing my model first),
or by me summarizing your model and asking some clarifying question,
or by you responding to my concrete objections in an analytic way,
or by me responding to your latest comment (though I don’t really know how to do that, since something about the expected frame of that reply feels off)
I don’t really have any super strong preference for any of these, but will likely not respond for a day. After that, I will try summarizing your perspective a bit more explicitly and then either ask some followup questions or point out the contradictions I currently see in it more explicitly.
I don’t understand the relevance of your responses to my stated model. I’d like it if you tried to explain why your responses are relevant, in a way that characterizes what you think I’m saying more explicitly.
My other most recent comment tries to show what your perspective looks like to me, and what I think it’s missing.
I think this is the most helpful encapsulation I’ve gotten of your preferred meta-frame.
I think I mostly just agree with it now that it’s spelled out a better. (I think I have some disagreements about how exactly rationalist forums should relate to this, and what moods are useful. But in this case I basically agree that the actions you suggest at the end are the right move and it seems better to focus on that).
This seems like a proposal to use the same kinds of postural adjustments on a group that includes anatomically complete human beings, and lumps of clay. Even if there’s a continuum between the two, if what you want to produce is the former, adjustments that work for the latter are going to be a bad idea.
If someone’s inconsistencies are due to an internal confusion about what’s true, that’s a different situation requiring a different kind of response from the situation in which those inconsistencies are due to occasionally lying when they have an incentive to avoid disclosing their true belief structure. Both are different from one in which there simply isn’t an approximately coherent belief structure to be represented.
Can’t answer for habryka, but my current guess of where you’re point at here is something like: “the sort of drive towards consistency is part of an overall pattern that seems net harmful, and that the correct action is more like stopping and thinking than like ‘trying to do better at what you were currently doing’.”
(You haven’t yet told me if this comment was successfully passing your ITT, but it’s my working model of your frame)
I think habryka (and separately, but not coincidentally, me) has a belief that he’s the sort of person where looking for opportunities to improve consistency is beneficial. I’m not sure whether you’re disagreeing with that, or if you’re point is more that the median-lesswrong will be taking wrong advice from this?
[Assuming I’ve got your frame right, I obviously disagree quite a bit – but I’m not sure what to do about it locally, here]
Thanks for checking—I’m trying to say something pretty different.
It seems like the frame of the OP is lumping together the kind of consistency that comes from using the native architecture to model the deep structure of reality (see also Geometers, Scribes, and the structure of intelligence), and the kind of consistency that comes from trying to perform a guaranteed level of service for an outside party (see also Unreal’s idea of Dependability), and an important special case of the latter is rule-following as a form of submission or blame-avoidance. These are very different mental structures, respond very differently to incentives, and learn very different things from criticism. (Nightmare of the Perfectly Principled is my most direct attempt to point to this distinction.)
People who are trying to submit or avoid blame will try to alleviate the pressure of criticism with minimal effort, in ways that aren’t connected to their other beliefs. On the other hand, people with structured models will sometimes leapfrog past the critic, or jump in another direction entirely, as Benito pointed out in A Sketch of Good Communication.
If we don’t distinguish between these cases, then attempts to reason about the “optimal” attitude towards integrity or accountability will end up a lumpy, unsatisfactory linear compromise between the following policy goals:
Helping people with structurally integrated models notice tensions in their models that they can learn from.
Distinguishing people with structurally integrated models from those who (at least in the relevant domain) are mostly just trying not to stick out as wrong, so we can stop listening to the second group.
Establishing and enforcing the norms needed to coordinate actions among equals (e.g. shared expectations about promises).
Compelling a complicated performance from inferiors, or avoiding punishment by superiors trying to compel a complicated performance from you.
Converting people without structurally integrated models into people with structurally integrated models (or vice versa).
Depending on what problem you’re trying to solve, habryka’s statement that “if someone changes their stated principles in an unpredictable fashion every day (or every hour), then I think most of the benefits of openly stating your principles disappear” can be almost exactly backwards.
If your principles predictably change based on your circumstances, that’s reasonably likely to be a kind of adversarial optimization similar to A/B testing of communication. They don’t mean their literal content, at least.
But there’s plenty of point in principles consistent with learning new things fast. In that case, change represents noise, which is costly, but much less costly than messaging optimized for extraction. And of course changing principles doesn’t need to imply a change in behavior to match—your new principles can and should take into account the fact that people may have committed resources based on your old stated principles.
In summary, my objection is that habryka seems to be thinking of beliefs as a special case of promises, while I think that if we’re trying to succeed based on epistemic rationality, we should be modeling promises as a special case of beliefs. For more detail on that, see Bindings and Assurances.
Agree strongly with this decomposition of integrity. They’re definitely different (although correlated) things.
My biggest disagreement with this model is that the first form (structurally integrated models) seems to me to be something broader? Something like, you have structurally integrated models of how things work and what matters to you, and take the actions suggested by the models to achieve what matters to you based on how things work?
Need to think through this in more detail. One can have what one might call integrity of thought without what one might call integrity of action based on that thought—you have the models, but others/you can’t count on you to act on them. And you can have integrity of action without integrity of thought, in the sense that you can be counted on to perform certain actions in certain circumstances, without integrity of thought, in which case you’ll do them whether or not it makes any sense, but you can at least be counted on. Or you can have both.
And I agree you have to split integrity of action into keeping promises when you make them slash following one’s own code, and keeping to the rules of the system slash following others’ codes, especially codes that determine what is blameworthy. To me, that third special case isn’t integrity. It’s often a good thing, but it’s a different thing—it counts as integrity if and only if one is following those rules because of one’s own code saying one should follow the outside code. We can debate under what circumstances that is or isn’t the right code, and should.
So I think for now I have it as Integrity-1 (Integrity of Thought) and Integrity-2 (Integrity of Action), and a kind of False-Integrity-3 (Integrity of Blamelessness) that is worth having a name for, and tracking who has and doesn’t have it in what circumstances to what extent, like the other two, but isn’t obviously something it’s better to increase than decrease by default. Whereas Integrity-1 is by default to be increased, as is Integrity-2, and if you disagree with that, this implies to me there’s a conflict causing you to want others to be less effective, or you’re otherwise trying to do extraction or be zero sum.
It seems to me that integrity of thought is actually quite a lot easier if it constrains the kind of anticipations that authentically and intuitively affect actions. Actions can still diverge from beliefs if someone with integrity of thought gets distracted enough to drop into a stereotyped habit (e.g. if I’m a bit checked out while driving and end up at a location I’m used to going to instead of the one I need to be at) or is motivated to deceive (e.g. corvids that think carefully about how to hide their food from other corvids).
The kind of belief-action split we’re used to seeing, I think, involves a school-broken sort of “believing” that’s integrated with the structures that are needed to give coherent answers on tests, but severed from thinking about one’s actual environment and interests.
The most important thing I did for my health in the last few years was healing this split.
False-Integrity-3 seems to me that it’s name could be Integrity of Innocence.
The concerns here make sense.
Something I still can’t tell about your concern, though: one of the things that seemed like the “primary takeaway” here, at least to me, is the concept of thinking carefully about who you want to be accountable to (and to be wary of holding yourself accountable to too many people that won’t be able to understand more complex moral positions you might hold)
So far, having thought through that concept through the various lenses of integrity you list here, it doesn’t seem like something that likely to make things worse. Do you think it is?
(the other claim, to think about what sort of incentives you want for yourself, does seem like the sort of thing some people might interpret as instruction to create coercive environments for themselves. This was fairly different from what I think habryka was aiming at, but I’d agree that might not be clear from this post)
I don’t think I find that objectionable, it didn’t seem particularly interesting as a claim. It’s as old as “you can only serve one master,” god vs mammon, etc etc—you can’t do well at accountability to mutually incompatible standards. I think it depends a lot on the type and scope of accountability, though.
If the takeaway were what mattered about the post, why include all the other stuff?
I think habryka was trying to get across a more cohesive worldview rather than just a few points. I also don’t know that my interpretation is the same as his. But here are some points that I took from this. (Hmm. These may not have been even slightly obvious in the current post, but were part of the background conversation that prompted it, and would probably have eventually been brought up in a future post. And I think the OP at least hints at them)
On Accountability
First, I think there are people in the LessWrong readership who still have some naive conception of “be accountable to the public”, which is in fact a recipe for
This is pretty different from how I’d describe this.
In some sense, you only get to serve one master. But value is complex and fragile. So there many be many facets of integrity or morality that you find important to pay attention to, and you might be missing some of them. Your master may contain multitudes.
Any given facet of morality (or more generally, things I care about it) is complicated. I might want, to hold myself to a higher standard that I currently meet, is to have several people whom I hold myself accountable to, each of which pays deep attention to a different facet.
If I’m running a company, I might want to be accountable to
people who deeply understand the industry I’m working in
people who deeply understand human needs, coercion, etc, who can tell me if I’m mistreating my workers,
people who understand how my industry interacts with other industries, the general populace, or the environment, who can call me out if I’m letting negative externalities run wild.
[edit] maybe just having a kinda regular person who just sanity checks if I seem crazy
I might want multiple people for each facet, who look at that facet through a different lens.
By having these facets represented in concrete individuals, I also improve my ability to resolve confusions about how to trade off multiple sacred values. Each individual might deeply understand their domain and see it as most important. But if they disagree, they can doublecrux with each other, or with me, and I can try to integrate their views into something coherent and actionable.
There’s also the important operationalization of “what does accountable mean?” There’s different powers you could give these people, possibly including:
Emergency Doublecrux button – they can demand N hours of your time per year, at least forcing you to have a conversation to justify yourself
Vote of No Confidence – i.e. if your project still seems good but you seem corrupt, they can fire you and replace you
Shut down your project, if it seems net negative
There might be some people you trust with some powers but not others (i.e. you might think someone has good perspectives that justify the emergency double crux button but not the “fire you” button)
There’s a somewhat different conception you could have of all this that’s more coalition focused than personal development focused.
I feel like all of this mixes together info sources and incentives, so it feels a bit wrong to say I agree, but also feels a bit wrong to say I disagree.
I agree that there’s a better, crisper version of this that has those more distinct.
I’m not sure if the end product, for most people, should keep them distinct because by default humans seem to use blurry clusters of concepts to simplify things into something manageable.
But, I think if you’re aiming to be a robust agent, or build a robustly agentic organization, there’s is something valuable about keeping these crisply separate so you can reason about them well. (you’ve previously mentioned that this is analogous to the friendly AI problem and I agree). I think it’s a good project for many people in the rationalsphere to have undertaken to deepen our understanding, even if it turns out not to be practical for the average person.
The “different masters” thing is a special case of the problem of accepting feedback (i.e. learning from approval/disapproval or reward/punishment) from approval functions in conflict with each other or your goals. Multiple humans trying to do the same or compatible things with you aren’t “different masters” in this sense, since the same logical-decision-theoretic perspective (with some noise) is instantiated on both.
But also, there’s all sorts of gathering data from others’ judgment that doesn’t fit the accountability/commitment paradigm.
To give a concrete example, I expect math prodigies to have the easiest time solving any given math problem, but even so, I don’t expect that a system that punishes the students who don’t complete their assignments correctly will serve the math prodigies well. This, even if under other, totally different circumstances it’s completely appropriate to compel performance of arbitrary assignments through the threat of punishment.