I shouldn’t be here, but I can’t stay away. Systems which produce as output intellectual content in an ongoing fashion run the risk of low-entropy sink states without my intervention. I’m keeping an eye on you, because I care deeply about the rationalist program.
Frankly I have an obsession with playing games with the rationalist community and its members. I spent a long time trying to do so maximally cooperatively, pursuing a career in AI safety research; perfectionism was paralyzing, and I got stuck at a ladder step in this career path in a very painful way. I then tried to stay away for years; LessWrong is an attractor I was not able to ignore, and this manifested as internally maligning the community and probably doing downstream subtle harm rather than the intended causal separation.
My current belief is that indulging myself with the intention of some non-maximal cooperation (small but nonzero cosine distance; imperfect alignment) is an effective equilibrium. The first paragraph in this bio is a rationalization of this behavior that I partially believe, and I intend to follow a script like this–stirring pots and making messes only insofar as it seems plausibly like valuable temperature-raising intervention in our (roughly) shared search for epistemic progress.
This is a bit of an odd time to start debating, because I haven’t explicitly stated a position, and it seems we’re in agreement that that’s a good thing[1]. Calling this to attention because
You make good points.
The idea you’re disagreeing with digresses from any idea I would endorse multiple times in the first two sentences.
Speaking first to this point about culture wars: that all makes sense to me. By this argument, “trying to elevate something to being regulated by congress by turning it into a culture war is not a reliable strategy” is probably a solid heuristic.
I wonder whether we’ve lost the context of my top-level comment. The scope (the “endgame”) I’m speaking to is moving alignment into the set of technical safety issues that the broader ML field recognizes as its responsibility, as has happened with fairness. My main argument is that a typical ML scientist/engineer tends not to use systemic thought to adjudicate which moral issues are important, and this is instead “regulated by tribal circuitry” (to quote romeostevensit’s comment). This does not preclude their having requisite technical ability to make progress on the problem if they decide it’s important.
As far as strategic ideas, it gets hairy from there. Again, I think we’re in agreement that it’s good for me not to come out here with a half-baked suggestion[1].
–––
There’s a smaller culture war, a gray-vs-blue one, that’s been waging for quite some time now, in which more inflamed people argue about punching nazis and more reserved people argue about what’s more important between protecting specific marginalized groups or protecting discussion norms and standards of truth.
Here’s a hypothetical question that should bear on strategic planning: suppose you could triple the proportion of capable ML researchers who consider alignment to be their responsibility as an ML researcher, but all of the new population are on the blue side of zero on the protect-groups-vs-protect-norms debate. Is this an outcome more likely to save everyone?
On the plus side, the narrative will have shifted massively away from a bunch of the failure modes Rob identified in the post (this is by assumption: “consider alignment to be their responsibility”).
On the minus side, if you believe that LW/AF/EA-style beliefs/norms/aesthetics/ethics are key to making good progress on the technical problems, you might be concerned about alignment researchers of a less effective style competing for resources.
If no, is there some other number of people who could be convinced in this manner such that you would expect it to be positive on AGI outcomes?
To reiterate:
I expect a large portion of the audience here would dislike my ideas about this for reasons that are not helpful.
I expect it to be a bad look externally for it to be discussed carelessly on LW.
I’m not currently convinced it’s a good idea, and for reasons 1 and 2 I’m mostly deliberating it elsewhere.