Something I still can’t tell about your concern, though: one of the things that seemed like the “primary takeaway” here, at least to me, is the concept of thinking carefully about who you want to be accountable to (and to be wary of holding yourself accountable to too many people that won’t be able to understand more complex moral positions you might hold)
So far, having thought through that concept through the various lenses of integrity you list here, it doesn’t seem like something that likely to make things worse. Do you think it is?
(the other claim, to think about what sort of incentives you want for yourself, does seem like the sort of thing some people might interpret as instruction to create coercive environments for themselves. This was fairly different from what I think habryka was aiming at, but I’d agree that might not be clear from this post)
I don’t think I find that objectionable, it didn’t seem particularly interesting as a claim. It’s as old as “you can only serve one master,” god vs mammon, etc etc—you can’t do well at accountability to mutually incompatible standards. I think it depends a lot on the type and scope of accountability, though.
If the takeaway were what mattered about the post, why include all the other stuff?
If the takeaway were what mattered about the post, why include all the other stuff?
I think habryka was trying to get across a more cohesive worldview rather than just a few points. I also don’t know that my interpretation is the same as his. But here are some points that I took from this. (Hmm. These may not have been even slightly obvious in the current post, but were part of the background conversation that prompted it, and would probably have eventually been brought up in a future post. And I think the OP at least hints at them)
On Accountability
First, I think there are people in the LessWrong readership who still have some naive conception of “be accountable to the public”, which is in fact a recipe for
It’s as old as “you can only serve one master,”
This is pretty different from how I’d describe this.
In some sense, you only get to serve one master. But value is complex and fragile. So there many be many facets of integrity or morality that you find important to pay attention to, and you might be missing some of them. Your master may contain multitudes.
Any given facet of morality (or more generally, things I care about it) is complicated. I might want, to hold myself to a higher standard that I currently meet, is to have several people whom I hold myself accountable to, each of which pays deep attention to a different facet.
If I’m running a company, I might want to be accountable to
people who deeply understand the industry I’m working in
people who deeply understand human needs, coercion, etc, who can tell me if I’m mistreating my workers,
people who understand how my industry interacts with other industries, the general populace, or the environment, who can call me out if I’m letting negative externalities run wild.
[edit] maybe just having a kinda regular person who just sanity checks if I seem crazy
I might want multiple people for each facet, who look at that facet through a different lens.
By having these facets represented in concrete individuals, I also improve my ability to resolve confusions about how to trade off multiple sacred values. Each individual might deeply understand their domain and see it as most important. But if they disagree, they can doublecrux with each other, or with me, and I can try to integrate their views into something coherent and actionable.
There’s also the important operationalization of “what does accountable mean?” There’s different powers you could give these people, possibly including:
Emergency Doublecrux button – they can demand N hours of your time per year, at least forcing you to have a conversation to justify yourself
Vote of No Confidence – i.e. if your project still seems good but you seem corrupt, they can fire you and replace you
Shut down your project, if it seems net negative
There might be some people you trust with some powers but not others (i.e. you might think someone has good perspectives that justify the emergency double crux button but not the “fire you” button)
There’s a somewhat different conception you could have of all this that’s more coalition focused than personal development focused.
I feel like all of this mixes together info sources and incentives, so it feels a bit wrong to say I agree, but also feels a bit wrong to say I disagree.
I agree that there’s a better, crisper version of this that has those more distinct.
I’m not sure if the end product, for most people, should keep them distinct because by default humans seem to use blurry clusters of concepts to simplify things into something manageable.
But, I think if you’re aiming to be a robust agent, or build a robustly agentic organization, there’s is something valuable about keeping these crisply separate so you can reason about them well. (you’ve previously mentioned that this is analogous to the friendly AI problem and I agree). I think it’s a good project for many people in the rationalsphere to have undertaken to deepen our understanding, even if it turns out not to be practical for the average person.
The “different masters” thing is a special case of the problem of accepting feedback (i.e. learning from approval/disapproval or reward/punishment) from approval functions in conflict with each other or your goals. Multiple humans trying to do the same or compatible things with you aren’t “different masters” in this sense, since the same logical-decision-theoretic perspective (with some noise) is instantiated on both.
But also, there’s all sorts of gathering data from others’ judgment that doesn’t fit the accountability/commitment paradigm.
The concerns here make sense.
Something I still can’t tell about your concern, though: one of the things that seemed like the “primary takeaway” here, at least to me, is the concept of thinking carefully about who you want to be accountable to (and to be wary of holding yourself accountable to too many people that won’t be able to understand more complex moral positions you might hold)
So far, having thought through that concept through the various lenses of integrity you list here, it doesn’t seem like something that likely to make things worse. Do you think it is?
(the other claim, to think about what sort of incentives you want for yourself, does seem like the sort of thing some people might interpret as instruction to create coercive environments for themselves. This was fairly different from what I think habryka was aiming at, but I’d agree that might not be clear from this post)
I don’t think I find that objectionable, it didn’t seem particularly interesting as a claim. It’s as old as “you can only serve one master,” god vs mammon, etc etc—you can’t do well at accountability to mutually incompatible standards. I think it depends a lot on the type and scope of accountability, though.
If the takeaway were what mattered about the post, why include all the other stuff?
I think habryka was trying to get across a more cohesive worldview rather than just a few points. I also don’t know that my interpretation is the same as his. But here are some points that I took from this. (Hmm. These may not have been even slightly obvious in the current post, but were part of the background conversation that prompted it, and would probably have eventually been brought up in a future post. And I think the OP at least hints at them)
On Accountability
First, I think there are people in the LessWrong readership who still have some naive conception of “be accountable to the public”, which is in fact a recipe for
This is pretty different from how I’d describe this.
In some sense, you only get to serve one master. But value is complex and fragile. So there many be many facets of integrity or morality that you find important to pay attention to, and you might be missing some of them. Your master may contain multitudes.
Any given facet of morality (or more generally, things I care about it) is complicated. I might want, to hold myself to a higher standard that I currently meet, is to have several people whom I hold myself accountable to, each of which pays deep attention to a different facet.
If I’m running a company, I might want to be accountable to
people who deeply understand the industry I’m working in
people who deeply understand human needs, coercion, etc, who can tell me if I’m mistreating my workers,
people who understand how my industry interacts with other industries, the general populace, or the environment, who can call me out if I’m letting negative externalities run wild.
[edit] maybe just having a kinda regular person who just sanity checks if I seem crazy
I might want multiple people for each facet, who look at that facet through a different lens.
By having these facets represented in concrete individuals, I also improve my ability to resolve confusions about how to trade off multiple sacred values. Each individual might deeply understand their domain and see it as most important. But if they disagree, they can doublecrux with each other, or with me, and I can try to integrate their views into something coherent and actionable.
There’s also the important operationalization of “what does accountable mean?” There’s different powers you could give these people, possibly including:
Emergency Doublecrux button – they can demand N hours of your time per year, at least forcing you to have a conversation to justify yourself
Vote of No Confidence – i.e. if your project still seems good but you seem corrupt, they can fire you and replace you
Shut down your project, if it seems net negative
There might be some people you trust with some powers but not others (i.e. you might think someone has good perspectives that justify the emergency double crux button but not the “fire you” button)
There’s a somewhat different conception you could have of all this that’s more coalition focused than personal development focused.
I feel like all of this mixes together info sources and incentives, so it feels a bit wrong to say I agree, but also feels a bit wrong to say I disagree.
I agree that there’s a better, crisper version of this that has those more distinct.
I’m not sure if the end product, for most people, should keep them distinct because by default humans seem to use blurry clusters of concepts to simplify things into something manageable.
But, I think if you’re aiming to be a robust agent, or build a robustly agentic organization, there’s is something valuable about keeping these crisply separate so you can reason about them well. (you’ve previously mentioned that this is analogous to the friendly AI problem and I agree). I think it’s a good project for many people in the rationalsphere to have undertaken to deepen our understanding, even if it turns out not to be practical for the average person.
The “different masters” thing is a special case of the problem of accepting feedback (i.e. learning from approval/disapproval or reward/punishment) from approval functions in conflict with each other or your goals. Multiple humans trying to do the same or compatible things with you aren’t “different masters” in this sense, since the same logical-decision-theoretic perspective (with some noise) is instantiated on both.
But also, there’s all sorts of gathering data from others’ judgment that doesn’t fit the accountability/commitment paradigm.