Nothing like taking over the world. From a certain angle it’s almost opposite to that, relinquishing some control.
The observations in my long comment suggest to me some different angles for how to talk about alignment risk. They are part of a style of discourse that is not well-respected on LessWrong, and this being a space where that is pushed out is probably good for the health of LessWrong. But the state of broader popular political/ethical discourse puts a lot of weight on these types of arguments, and they’re more effective (because they push around so much social capital) at convincing engineers they have an external responsibility.
I don’t want to be too specific with the arguments unless I pull the trigger on writing something longer form. I was being a little cheeky at the end of that comment and since I posted it I’ve been convinced that there’s more harm in expressing that idea ineffectively or dismissively than I’d originally estimated (so I’m grateful to the social mechanism that prevented me from causing harm there!).
A success story would look like building a memeplex that is entirely truthful, even if it’s distasteful to rationalists, and putting it out into the world, where it would elevate alignment to the status that fairness/accountability have in the ML community. This would be ideologically decentralizing to the field; in this success story I don’t expect the aesthetics of LessWrong, Alignment Forum, etc to be an adequate home for this new audience, and I would predict something that sounds more like that link from Percy Liang becoming the center of the conversation. It would be a fun piece of trivia that this came from the rationalist community, and look at all this other crazy stuff they said! We might hear big names in AI say of Yudkowsky what Wittgenstein said of Russell:
Russell’s books should be bound in two colours…those dealing with mathematical logic in red – and all students of philosophy should read them; those dealing with ethics and politics in blue – and no one should be allowed to read them.
This may be scary because it means the field would be less aligned than it is currently. My instinct says this is a kind of misalignment that we’re already robust to: ideological beliefs in other scientific communities are often far more heterogeneous than those of the AI alignment community. It may be scary because the field would become more political, which may end up lowering effectiveness, contra my hypothesis that growing the field this way would be effective. It may be scary because it’s intensely status-lowering in some contexts for anyone who would be reading this.
I’m still on the fence for whether this would be good on net. Every time I see a post about the alignment discourse among broader populations, I interpret it as people interested in having some of this conversation, and I’ll keep probing.
So… growing alignment by… merging it with far-less-important political issues… and being explicitly culture-war-y about it? Is the endgame getting non-right-wing politicians to regulate AGI development?
Because… for [structural](https://en.wikipedia.org/wiki/United_States_Electoral_College) [reasons](https://en.wikipedia.org/wiki/Gerrymandering), one side has an advantage in many culture war battles, despite being a minority of citizens at the national level. (Within a state, either "side" could have the advantage, but whichever one has the advantage, tends to keep it for a while).
̶Turning AGI safety into a culture war thing is a bad idea, since [culture war things don’t seem get much actual progress in Congress](https://www.slowboring.com/p/the-rise-and-importance-of-secret), and when they ̶do̶ it’s often something bad (see <s>"structural reasons" above).
If this was your idea, I guess I’m glad you didn’t post it, but I also think you should think harder over whether it is, in fact, actually a good idea. Upvoted to commend your honesty though (glad 2-axis voting is on now).
TLDR: P(culture war win & it’s in favor of regulation) + P(cww & it’s anti-regulation) + P(culture war loss) ≈ 1, and the first term is, by a quick reading of politics, outweighed by the combination of the other 2 terms.
This is a bit of an odd time to start debating, because I haven’t explicitly stated a position, and it seems we’re in agreement that that’s a good thing[1]. Calling this to attention because
You make good points.
The idea you’re disagreeing with digresses from any idea I would endorse multiple times in the first two sentences.
Speaking first to this point about culture wars: that all makes sense to me. By this argument, “trying to elevate something to being regulated by congress by turning it into a culture war is not a reliable strategy” is probably a solid heuristic.
I wonder whether we’ve lost the context of my top-level comment. The scope (the “endgame”) I’m speaking to is moving alignment into the set of technical safety issues that the broader ML field recognizes as its responsibility, as has happened with fairness. My main argument is that a typical ML scientist/engineer tends not to use systemic thought to adjudicate which moral issues are important, and this is instead “regulated by tribal circuitry” (to quote romeostevensit’s comment). This does not preclude their having requisite technical ability to make progress on the problem if they decide it’s important.
As far as strategic ideas, it gets hairy from there. Again, I think we’re in agreement that it’s good for me not to come out here with a half-baked suggestion[1].
–––
There’s a smaller culture war, a gray-vs-blue one, that’s been waging for quite some time now, in which more inflamed people argue about punching nazis and more reserved people argue about what’s more important between protecting specific marginalized groups or protecting discussion norms and standards of truth.
Here’s a hypothetical question that should bear on strategic planning: suppose you could triple the proportion of capable ML researchers who consider alignment to be their responsibility as an ML researcher, but all of the new population are on the blue side of zero on the protect-groups-vs-protect-norms debate. Is this an outcome more likely to save everyone?
On the plus side, the narrative will have shifted massively away from a bunch of the failure modes Rob identified in the post (this is by assumption: “consider alignment to be their responsibility”).
On the minus side, if you believe that LW/AF/EA-style beliefs/norms/aesthetics/ethics are key to making good progress on the technical problems, you might be concerned about alignment researchers of a less effective style competing for resources.
If no, is there some other number of people who could be convinced in this manner such that you would expect it to be positive on AGI outcomes?
suppose you could triple the proportion of capable ML researchers who consider alignment to be their responsibility as an ML researcher, but all of the new population are on the blue side of zero on the protect-groups-vs-protect-norms debate. Is this an outcome more likely to save everyone?
Allying AI safety with DEI LGBTQIA+ activism won’t do any favors to AI safety. Nor do I think it’s a really novel idea. Effective Altruism occasionally flirts with DEI and other people have suggested using similar tactics to get AI safety in the eyes of modern politics.
AI researchers are already linking AI safety with DEI with the effect of limiting the appearance of risk. If someone was to read a ‘risks’ section on an OpenAI paper they would come away with the impression that the biggest risk of AI is that someone could use it to make a misleading photo of a politician or that the AI might think flight attendants are more likely to be women than men! Their risks section on Dalle-2 reads:
“Use of DALL·E 2 has the potential to harm individuals and groups by reinforcing stereotypes, erasing or denigrating them, providing them with disparately low quality performance, or by subjecting them to indignity.”
[...]
The default behavior of the DALL·E 2 Preview produces images that tend to overrepresent people who are White-passing and Western concepts generally. In some places it over-represents generations of people who are female-passing (such as for the prompt: “a flight attendant” ) while in others it over-represents generations of people who are male-passing (such as for the prompt: “a builder”).
The point being, DEI does not take up newcomers and lend its support to their issues. It subsumes real issues and funnels efforts directed to solve them towards the DEI wrecking ball.
Nothing like taking over the world. From a certain angle it’s almost opposite to that, relinquishing some control.
The observations in my long comment suggest to me some different angles for how to talk about alignment risk. They are part of a style of discourse that is not well-respected on LessWrong, and this being a space where that is pushed out is probably good for the health of LessWrong. But the state of broader popular political/ethical discourse puts a lot of weight on these types of arguments, and they’re more effective (because they push around so much social capital) at convincing engineers they have an external responsibility.
I don’t want to be too specific with the arguments unless I pull the trigger on writing something longer form. I was being a little cheeky at the end of that comment and since I posted it I’ve been convinced that there’s more harm in expressing that idea ineffectively or dismissively than I’d originally estimated (so I’m grateful to the social mechanism that prevented me from causing harm there!).
A success story would look like building a memeplex that is entirely truthful, even if it’s distasteful to rationalists, and putting it out into the world, where it would elevate alignment to the status that fairness/accountability have in the ML community. This would be ideologically decentralizing to the field; in this success story I don’t expect the aesthetics of LessWrong, Alignment Forum, etc to be an adequate home for this new audience, and I would predict something that sounds more like that link from Percy Liang becoming the center of the conversation. It would be a fun piece of trivia that this came from the rationalist community, and look at all this other crazy stuff they said! We might hear big names in AI say of Yudkowsky what Wittgenstein said of Russell:
This may be scary because it means the field would be less aligned than it is currently. My instinct says this is a kind of misalignment that we’re already robust to: ideological beliefs in other scientific communities are often far more heterogeneous than those of the AI alignment community. It may be scary because the field would become more political, which may end up lowering effectiveness, contra my hypothesis that growing the field this way would be effective. It may be scary because it’s intensely status-lowering in some contexts for anyone who would be reading this.
I’m still on the fence for whether this would be good on net. Every time I see a post about the alignment discourse among broader populations, I interpret it as people interested in having some of this conversation, and I’ll keep probing.
EDIT: retracted in this context, see reply.
So… growing alignment by… merging it with far-less-important political issues… and being explicitly culture-war-y about it? Is the endgame getting non-right-wing politicians to regulate AGI development?Because… for [structural](https://en.wikipedia.org/wiki/United_States_Electoral_College) [reasons](https://en.wikipedia.org/wiki/Gerrymandering), one side has an advantage in many culture war battles, despite being a minority of citizens at the national level. (Within a state, either"side" could have the advantage, but whichever one has the advantage, tends to keep it for a while).̶Turning AGI safety into a culture war thing is a bad idea, since [culture war things don’t seem get much actual progress in Congress](https://www.slowboring.com/p/the-rise-and-importance-of-secret), and when they ̶do̶ it’s often something bad (see <s>"
;structural reasons" above).If this was your idea, I guess I’m glad you didn’t post it, but I also think you should think harder over whether it is, in fact, actually a good idea. Upvoted to commend your honesty though (glad 2-axis voting is on now).TLDR: P(culture war win& it’s in favor of regulation) + P(cww& it’s anti-regulation) + P(culture war loss) ≈ 1, and the first term is, by a quick reading of politics, outweighed by the combination of the other 2 terms.This is a bit of an odd time to start debating, because I haven’t explicitly stated a position, and it seems we’re in agreement that that’s a good thing[1]. Calling this to attention because
You make good points.
The idea you’re disagreeing with digresses from any idea I would endorse multiple times in the first two sentences.
Speaking first to this point about culture wars: that all makes sense to me. By this argument, “trying to elevate something to being regulated by congress by turning it into a culture war is not a reliable strategy” is probably a solid heuristic.
I wonder whether we’ve lost the context of my top-level comment. The scope (the “endgame”) I’m speaking to is moving alignment into the set of technical safety issues that the broader ML field recognizes as its responsibility, as has happened with fairness. My main argument is that a typical ML scientist/engineer tends not to use systemic thought to adjudicate which moral issues are important, and this is instead “regulated by tribal circuitry” (to quote romeostevensit’s comment). This does not preclude their having requisite technical ability to make progress on the problem if they decide it’s important.
As far as strategic ideas, it gets hairy from there. Again, I think we’re in agreement that it’s good for me not to come out here with a half-baked suggestion[1].
–––
There’s a smaller culture war, a gray-vs-blue one, that’s been waging for quite some time now, in which more inflamed people argue about punching nazis and more reserved people argue about what’s more important between protecting specific marginalized groups or protecting discussion norms and standards of truth.
Here’s a hypothetical question that should bear on strategic planning: suppose you could triple the proportion of capable ML researchers who consider alignment to be their responsibility as an ML researcher, but all of the new population are on the blue side of zero on the protect-groups-vs-protect-norms debate. Is this an outcome more likely to save everyone?
On the plus side, the narrative will have shifted massively away from a bunch of the failure modes Rob identified in the post (this is by assumption: “consider alignment to be their responsibility”).
On the minus side, if you believe that LW/AF/EA-style beliefs/norms/aesthetics/ethics are key to making good progress on the technical problems, you might be concerned about alignment researchers of a less effective style competing for resources.
If no, is there some other number of people who could be convinced in this manner such that you would expect it to be positive on AGI outcomes?
To reiterate:
I expect a large portion of the audience here would dislike my ideas about this for reasons that are not helpful.
I expect it to be a bad look externally for it to be discussed carelessly on LW.
I’m not currently convinced it’s a good idea, and for reasons 1 and 2 I’m mostly deliberating it elsewhere.
Ah, thank you for clarification!
Allying AI safety with DEI LGBTQIA+ activism won’t do any favors to AI safety. Nor do I think it’s a really novel idea. Effective Altruism occasionally flirts with DEI and other people have suggested using similar tactics to get AI safety in the eyes of modern politics.
AI researchers are already linking AI safety with DEI with the effect of limiting the appearance of risk. If someone was to read a ‘risks’ section on an OpenAI paper they would come away with the impression that the biggest risk of AI is that someone could use it to make a misleading photo of a politician or that the AI might think flight attendants are more likely to be women than men! Their risks section on Dalle-2 reads:
The point being, DEI does not take up newcomers and lend its support to their issues. It subsumes real issues and funnels efforts directed to solve them towards the DEI wrecking ball.