This model seems to assume that the “oversight” of the “normal system” at the center of the gravity well is trustworthy.
On the core point, I think you improve / fix problems with the normal system in the boring, hard ways, and do deeply appreciate you championing particular virtues even when I disagree on where the balance of virtues lies.
I find something offputting here about the word “trustworthy,” because I feel like it’s a 2-place word; I think of oversight as something like “good enough to achieve standard X”, whereas “trustworthy” alone seems to imply there’s a binary standard that is met or not (and has been met). It seems like we could easily have very different standards for trustworthiness that cause us to not disagree on the facts while disagreeing on the implications.
(Somehow, it reminds me of this post and Caledonian’s reaction to it.)
Have you thought at all about how to prevent the center of the gravity well from becoming predatory?
Yes. Mostly this has focused on recruitment work for MIRI, where we really don’t want to guilt people into working on x-risk reduction (as not only is it predatory, it also is a recipe for them burning out instead of being productive, and so morality and efficiency obviously align), and yet most of the naive ways to ask people to consider working on x-risk reduction risk guilting them, and you need a more sophisticated way to remove that failure mode than just saying “please don’t interpret this as me guilting you into it!”. This is a thing that I’ve already written that’s parts of my longer thoughts here.
And, obviously, when I think about moderating LessWrong, I think about how to not become corrupt myself, and what sorts of habits and systems lower the chances of that, or make it more obvious if it does happen.
it’s a 2-place word [...] It seems like we could easily have very different standards for trustworthiness that cause us to not disagree on the facts while disagreeing on the implications.
Right, I agree that we don’t want to get into a pointless pseudo-argument where everyone agrees that x = 60, and yet we have a huge shouting match over whether this should be described using the English word “large” or “small.”
Maybe a question that would lead to a more meaningful disagreement would be, “Should our culture become more or less centralized?”—where centralized is the word I’m choosing to refer to a concept I’m going to try to describe extensionally/ostensively in the following two paragraphs.[1]
In a high-centralization culture, there’s a stronger presumption that our leaders in the town center come closer to knowing everything already, and that the reasoning styles or models being hawked by fringe masters are likely to “contain traps that the people absorbing the model are unable to see”: that is, thinking for yourself doesn’t work. As a result, our leaders might talk up “the value of having a community-wide immune system” so that they can “act against people who are highly manipulative and deceitful before they have clear victims.” If a particular fringe master starts becoming popular, our leaders might want to announce that they are “actively hostile to [the fringe master], and make it clear that [we] do not welcome support from those quarters.”
You seem to be arguing that we should become more centralized. I think that would be moving our culture in the absolute wrong direction. As long as we’re talking about patterns of adversarial optimization, I have to say that, to me, this kind of move looks optimized for “making it easier to ostracize and silence people who could cause trouble for MIRI and CfAR (e.g., Vassar or Ziz), either by being persistent critics or by embarrassing us in front of powerful third parties who are using guilt-by-association heuristics”, rather than improving our collective epistemics.
This seems like a substantial disagreement, rather than a trivial Sorites problem about how to use the word “trustworthy”.
do deeply appreciate you championing particular virtues even when I disagree on where the balance of virtues lies
not because most fringe masters are particularly good (they aren’t), but because thinking for yourself actually works and it’s not like our leaders in the town center know everything already.
I think the leaders in the town center do not know everything already. I think different areas have different risks when it comes to “thinking for yourself.” It’s one thing to think you can fly and jump off a roof yourself, and another thing to think it’s fine to cook for people when you’re Typhoid Mary, and I worry that you aren’t drawing a distinction here between those cases.
I have thought about this a fair amount, but am not sure I’ve discovered right conceptual lines here, and would be interested in how you would distinguish between the two cases, or if you think they are fundamentally equivalent, or that one of them isn’t real.
You seem to be arguing that we should become more centralized. I think that would be moving our culture in the absolute wrong direction.
In short, I think there are some centralizing moves that are worth it, and others that aren’t, and that we can choose policies individually instead of just throwing the lever on “centralization: Y/N”. Well-Kept Gardens Die by Pacifism is ever relevant; here, the thing that seems relevant to me is that there are some basic functions that need to happen (like, say, the removal of spam), and fulfilling those functions requires tools that could also be used for nefarious functions (as we could just mark criticisms of MIRI as ‘spam’ and they would vanish). But the conceptual categories that people normally have for this are predicated on the interesting cases; sure, both Nazi Germany and WWII America imprisoned rapists, but the interesting imprisonments are of political dissidents, and we might prefer WWII America because it had many fewer such political prisoners, and further prefer a hypothetical America that had no political prisoners. But this spills over into the question of whether we should have prisons or justice systems at all, and I think people’s intuitions on political dissidents are not very useful for what should happen with the more common sort of criminal.
Like, it feels almost silly to have to say this, but I like it when people put forth public positions that are critical of an idea I favor, because then we can argue about it and it’s an opportunity for me to learn something, and I generally expect the audience to be able to follow it and get things right. Like, I disagreed pretty vociferously with The AI Timelines Scam, and yet I thought the discussion it prompted was basically good. It did not ping my Out To Get You sensors in the way that ialdabaoth does. To me, this feels like a central example of the sort of thing you see in a less centralized culture where people are trying to think things through for themselves and end up with different answers, and is not at risk from this sort of moderation.
On the core point, I think you improve / fix problems with the normal system in the boring, hard ways, and do deeply appreciate you championing particular virtues even when I disagree on where the balance of virtues lies.
I find something offputting here about the word “trustworthy,” because I feel like it’s a 2-place word; I think of oversight as something like “good enough to achieve standard X”, whereas “trustworthy” alone seems to imply there’s a binary standard that is met or not (and has been met). It seems like we could easily have very different standards for trustworthiness that cause us to not disagree on the facts while disagreeing on the implications.
(Somehow, it reminds me of this post and Caledonian’s reaction to it.)
Yes. Mostly this has focused on recruitment work for MIRI, where we really don’t want to guilt people into working on x-risk reduction (as not only is it predatory, it also is a recipe for them burning out instead of being productive, and so morality and efficiency obviously align), and yet most of the naive ways to ask people to consider working on x-risk reduction risk guilting them, and you need a more sophisticated way to remove that failure mode than just saying “please don’t interpret this as me guilting you into it!”. This is a thing that I’ve already written that’s parts of my longer thoughts here.
And, obviously, when I think about moderating LessWrong, I think about how to not become corrupt myself, and what sorts of habits and systems lower the chances of that, or make it more obvious if it does happen.
Right, I agree that we don’t want to get into a pointless pseudo-argument where everyone agrees that x = 60, and yet we have a huge shouting match over whether this should be described using the English word “large” or “small.”
Maybe a question that would lead to a more meaningful disagreement would be, “Should our culture become more or less centralized?”—where centralized is the word I’m choosing to refer to a concept I’m going to try to describe extensionally/ostensively in the following two paragraphs.[1]
A low-centralization culture has slogans like, “Nullis in verba” or “Constant vigilance!”. If a fringe master sets up shop on the outskirts of town, the default presumption is that (time permitting) you should “consider it open-mindedly and then steal only the good parts [...] [as] an obvious guideline for how to do generic optimization”, not because most fringe masters are particularly good (they aren’t), but because thinking for yourself actually works and it’s not like our leaders in the town center know everything already.
In a high-centralization culture, there’s a stronger presumption that our leaders in the town center come closer to knowing everything already, and that the reasoning styles or models being hawked by fringe masters are likely to “contain traps that the people absorbing the model are unable to see”: that is, thinking for yourself doesn’t work. As a result, our leaders might talk up “the value of having a community-wide immune system” so that they can “act against people who are highly manipulative and deceitful before they have clear victims.” If a particular fringe master starts becoming popular, our leaders might want to announce that they are “actively hostile to [the fringe master], and make it clear that [we] do not welcome support from those quarters.”
You seem to be arguing that we should become more centralized. I think that would be moving our culture in the absolute wrong direction. As long as we’re talking about patterns of adversarial optimization, I have to say that, to me, this kind of move looks optimized for “making it easier to ostracize and silence people who could cause trouble for MIRI and CfAR (e.g., Vassar or Ziz), either by being persistent critics or by embarrassing us in front of powerful third parties who are using guilt-by-association heuristics”, rather than improving our collective epistemics.
This seems like a substantial disagreement, rather than a trivial Sorites problem about how to use the word “trustworthy”.
Thanks. I like you, too.
I just made this up, so I’m not at all confident this is the right concept, much like how I didn’t think contextualing-vs.-decoupling was the right concept.
I think the leaders in the town center do not know everything already. I think different areas have different risks when it comes to “thinking for yourself.” It’s one thing to think you can fly and jump off a roof yourself, and another thing to think it’s fine to cook for people when you’re Typhoid Mary, and I worry that you aren’t drawing a distinction here between those cases.
I have thought about this a fair amount, but am not sure I’ve discovered right conceptual lines here, and would be interested in how you would distinguish between the two cases, or if you think they are fundamentally equivalent, or that one of them isn’t real.
In short, I think there are some centralizing moves that are worth it, and others that aren’t, and that we can choose policies individually instead of just throwing the lever on “centralization: Y/N”. Well-Kept Gardens Die by Pacifism is ever relevant; here, the thing that seems relevant to me is that there are some basic functions that need to happen (like, say, the removal of spam), and fulfilling those functions requires tools that could also be used for nefarious functions (as we could just mark criticisms of MIRI as ‘spam’ and they would vanish). But the conceptual categories that people normally have for this are predicated on the interesting cases; sure, both Nazi Germany and WWII America imprisoned rapists, but the interesting imprisonments are of political dissidents, and we might prefer WWII America because it had many fewer such political prisoners, and further prefer a hypothetical America that had no political prisoners. But this spills over into the question of whether we should have prisons or justice systems at all, and I think people’s intuitions on political dissidents are not very useful for what should happen with the more common sort of criminal.
Like, it feels almost silly to have to say this, but I like it when people put forth public positions that are critical of an idea I favor, because then we can argue about it and it’s an opportunity for me to learn something, and I generally expect the audience to be able to follow it and get things right. Like, I disagreed pretty vociferously with The AI Timelines Scam, and yet I thought the discussion it prompted was basically good. It did not ping my Out To Get You sensors in the way that ialdabaoth does. To me, this feels like a central example of the sort of thing you see in a less centralized culture where people are trying to think things through for themselves and end up with different answers, and is not at risk from this sort of moderation.
I don’t think this conversation is going to make any progress at this level of abstraction and in public. I might send you an email.
I look forward to receiving it.