I’ve had this in my drafts for a year. I don’t feel like the current version of it is saying something either novel or crisp enough to quite make sense as a top-level post, but wanted to get it out at least as a shortform for now.
There’s a really tough situation I think about a lot, from my perspective as a LessWrong moderator. These are my personal thoughts on it.
The problem, in short:
Sometimes a problem is epistemically confusing, and there are probably political ramifications of it, such that the most qualified people to debate it are also in conflict with billions of dollars on the line and the situation is really high stakes (i.e. the extinction of humanity) such that it really matters we get the question right.
Political conflict + epistemic murkiness means that it’s not clear what “thinking and communicating sanely” about the problem look like, and people have (possibly legitimate) reasons to be suspicious of each other’s reasoning.
High Stakes means that we can’t ignore the problem.
I don’t feel like our current level of rationalist discourse patterns are sufficient for this combo of high stakes, political conflict, and epistemic murkiness.
Spelling out some concrete examples
Interventions that help with AI extinction risk are often hard to evaluate. Reasonable people can disagree whether a project ranges from “highly net positive” to “highly net negative”. Smart people I know have fairly different strategic perspectives on how humanity can survive the 21st century.
Sometimes these disagreements are more technical. How will differential technology play out? I’ve heard some arguments that improving alignment techniques on current-generation ML systems may be negative, because a) it won’t actually help align powerful AI systems past the sharp left turn, and meanwhile b) makes it easier and more profitable to deploy AI in ways that could start to escalate beyond our control (killing us in slightly more mundane ways than the fast takeoff scenarios).
I’ve heard arguments that even interpretability, which you’d think is a purely positive source of information, is also helpful for capabilities (in particular if the interpretability is actually any good). And maybe you actually need a lot of interpretability before the alignment benefits outweigh the capability gains.
Some disagreements are the intersection of political, technical, and psychological.
The stakes here matter a lot. We’re talking about the end of the world and/or the cosmic endowment.
Common professional politeness norms typically paper over conflict rather than leaning into it.There aren’t (as much) consensus “professional” norms for dealing directly with high stakes conflict that preserve epistemic.
A lot of conversation has been going on for a long time, but not everyone’s participated in the same arguments, so some people feel like “We’ve had the object-level arguments and I don’t really know where to go other than to say ‘it sure looks to me like you’re psychologically motivated here’”, and others are like “why are you jumping to assumptions about me when AFAICT we haven’t even hashed out the object level?”
People disagree about what counts as good concrete technical arguments, and I think at least some (thought IMO not all) of that disagreement is for fairly reasonable reasons.
I have some guesses about how to think about this, but I feel some confusion about them. And it feels pedagogically bad for this to end with “and therefore, [Insert some specific policy or idea]” rather than “okay, what is the space of considerations and desiderata here?” a la Hold Off On Proposing Solutions.
This intersects sharply with your prior post about feedback loops, I think.
As it is really hard / maybe impossible (???) for individuals to reason well in situations where you do not have a feedback loop, it is really hard / maybe impossible to make a community of reasoning well in a situation without feedback loops.
Like at some point, in a community, you need to be able to point to (1) canonical works that form the foundation of further thought, (2) examples of good reasoning to be imitated by everyone. If you don’t have those, you have a sort of glob of memes and ideas and shit that people can talk about to signal that they “get it,” but it’s all kinda arbitrary and conversation cannot move on because nothing is ever established for sure.
And like—if you never have clear feedback, I think it’s hard to have canonical works / examples of good reasoning other than by convention and social proof. There are works in LW which you have to have read in order to continue various conversations, but whether these works are good or not is highly disputed.
I of course have some proposed ideas for how to fix the situation—this—but my proposed ideas would clean out the methods of reasoning and argument with which I disagree, which is indeed the problem.
I of course have some proposed ideas for how to fix the situation—this—but my proposed ideas would clean out the methods of reasoning and argument with which I disagree, which is indeed the problem.
I don’t have a super strong memory of this, did you have a link? (not sure how directly relevant but was interested)
Your memory is fine, I was writing badly—I meant the ideas I would propose rather than the ideas I have proposed by “proposed ideas.” The flavor would be something super-empiricist like this, not that I endorse that as perfect. I do think ideas without empirical restraint loom too large in the collective.
Have you considered hosting a discussion on this topic? I’m sure you’ve already had some discussions on this topic, but a public conversation could help surface additional ideas and/or perspectives that could help you make sense of this.
High Stakes Value and the Epistemic Commons
I’ve had this in my drafts for a year. I don’t feel like the current version of it is saying something either novel or crisp enough to quite make sense as a top-level post, but wanted to get it out at least as a shortform for now.
There’s a really tough situation I think about a lot, from my perspective as a LessWrong moderator. These are my personal thoughts on it.
The problem, in short:
Sometimes a problem is epistemically confusing, and there are probably political ramifications of it, such that the most qualified people to debate it are also in conflict with billions of dollars on the line and the situation is really high stakes (i.e. the extinction of humanity) such that it really matters we get the question right.
Political conflict + epistemic murkiness means that it’s not clear what “thinking and communicating sanely” about the problem look like, and people have (possibly legitimate) reasons to be suspicious of each other’s reasoning.
High Stakes means that we can’t ignore the problem.
I don’t feel like our current level of rationalist discourse patterns are sufficient for this combo of high stakes, political conflict, and epistemic murkiness.
Spelling out some concrete examples
Interventions that help with AI extinction risk are often hard to evaluate. Reasonable people can disagree whether a project ranges from “highly net positive” to “highly net negative”. Smart people I know have fairly different strategic perspectives on how humanity can survive the 21st century.
Sometimes these disagreements are political – is pivotal acts a helpful frame or a harmful one? How suspicious should we be of safetywashing?
Sometimes these disagreements are more technical. How will differential technology play out? I’ve heard some arguments that improving alignment techniques on current-generation ML systems may be negative, because a) it won’t actually help align powerful AI systems past the sharp left turn, and meanwhile b) makes it easier and more profitable to deploy AI in ways that could start to escalate beyond our control (killing us in slightly more mundane ways than the fast takeoff scenarios).
I’ve heard arguments that even interpretability, which you’d think is a purely positive source of information, is also helpful for capabilities (in particular if the interpretability is actually any good). And maybe you actually need a lot of interpretability before the alignment benefits outweigh the capability gains.
Some disagreements are the intersection of political, technical, and psychological.
You might argue that people in AGI companies are motivated by excitement over AI, or making money, and are only paying lip service to safety. Your beliefs about this might include “their technical agenda doesn’t make any sense to you” as well as “you have a strong guess about what else might be motivating their agenda.”
You might think AI Risk advocates are advocating pivotal acts because of a mix of trauma, finding politics distasteful, and technical mistakes regarding the intersection of boundaries and game theory.
This is all pretty gnarly, because
The stakes here matter a lot. We’re talking about the end of the world and/or the cosmic endowment.
Common professional politeness norms typically paper over conflict rather than leaning into it. There aren’t (as much) consensus “professional” norms for dealing directly with high stakes conflict that preserve epistemic.
A lot of conversation has been going on for a long time, but not everyone’s participated in the same arguments, so some people feel like “We’ve had the object-level arguments and I don’t really know where to go other than to say ‘it sure looks to me like you’re psychologically motivated here’”, and others are like “why are you jumping to assumptions about me when AFAICT we haven’t even hashed out the object level?”
People disagree about what counts as good concrete technical arguments, and I think at least some (thought IMO not all) of that disagreement is for fairly reasonable reasons.
I have some guesses about how to think about this, but I feel some confusion about them. And it feels pedagogically bad for this to end with “and therefore, [Insert some specific policy or idea]” rather than “okay, what is the space of considerations and desiderata here?” a la Hold Off On Proposing Solutions.
This intersects sharply with your prior post about feedback loops, I think.
As it is really hard / maybe impossible (???) for individuals to reason well in situations where you do not have a feedback loop, it is really hard / maybe impossible to make a community of reasoning well in a situation without feedback loops.
Like at some point, in a community, you need to be able to point to (1) canonical works that form the foundation of further thought, (2) examples of good reasoning to be imitated by everyone. If you don’t have those, you have a sort of glob of memes and ideas and shit that people can talk about to signal that they “get it,” but it’s all kinda arbitrary and conversation cannot move on because nothing is ever established for sure.
And like—if you never have clear feedback, I think it’s hard to have canonical works / examples of good reasoning other than by convention and social proof. There are works in LW which you have to have read in order to continue various conversations, but whether these works are good or not is highly disputed.
I of course have some proposed ideas for how to fix the situation—this—but my proposed ideas would clean out the methods of reasoning and argument with which I disagree, which is indeed the problem.
I don’t have a super strong memory of this, did you have a link? (not sure how directly relevant but was interested)
Your memory is fine, I was writing badly—I meant the ideas I would propose rather than the ideas I have proposed by “proposed ideas.” The flavor would be something super-empiricist like this, not that I endorse that as perfect. I do think ideas without empirical restraint loom too large in the collective.
Have you considered hosting a discussion on this topic? I’m sure you’ve already had some discussions on this topic, but a public conversation could help surface additional ideas and/or perspectives that could help you make sense of this.