[Question] Incentives affecting alignment-researcher encouragement

Nicholas / Heather Kross29 Aug 2023 5:11 UTC

28 points

My hypothesis: I think the incentives for “cultivating more/better researchers in a preparadigmatic field” lean towards “don’t discourage even less-promising researchers, because they could luck out and suddenly be good/useful to alignment in an unexpected way”.

Analogy: This is like how investors encourage startup founders because they bet on a flock of them, not necessarily because any particular founder’s best bet is to found a startup.

If timelines are short enough that [our survival depends on [unexpectedly-good paradigms]], and [unexpectedly-good paradigms] come from [black-swan researchers], then the AI alignment field is probably (on some level, assuming some coordination/game theory) incentivized to black-swan farm researchers.

Note: This isn’t necessarily bad (and in fact it’s probably good overall), it just puts the incentives into perspective. So individual researchers don’t feel so bad about “not making it” (where “making it” could be “getting a grant” or “getting into a program” or...)

The questions: Is this real or not? What, if anything, should anyone do, with this knowledge in hand?

Nicholas / Heather Kross29 Aug 2023 5:11 UTC

28 points

3 comments1 min readLW link

AI Alignment Fieldbuilding AI

Jonas Hallgren 29 Aug 2023 15:00 UTC
8 points
7
Well, incubators and many smaller bets are usually the best approach in this type of situation since you want to black-swan farm, as you say.

This is the approach I’m generally taking right now, similar to the pyramid scheme argument of getting more people to be EA, I think it’s worth mentoring new people on alignment perspectives.
So some stuff you could do:
1. Start a local organisation where you help people understand alignment and help them get perspectives. (Note that this only works over a certain quality of thinking but getting this is also a numbers game IMO.)
2. Create and mentor isolated groups of independent communities that develop alignment theories. (You want people to be weird but not too weird, meaning that they should be semi-independent from the larger community)
3. Theoretical alignment conference/competition with proposals optimised for being interesting in weird ways.

I’m trying to do this specifically in the Nordics as I think there are a bunch of smart people here who don’t want to move to the “talent hubs” so my marginal impact might be higher. To be honest, I’m uncertain on how much to focus on this vs. developing more on my personal alignment theories, but I’m a strong pyramid scheme believer. I could however be wrong on this and I would love to hear some more takes on this.
trevor 29 Aug 2023 19:11 UTC
4 points
2
I think it’s possible that talent funnels are soul-killing for the people facilitating them, not just due to churning >50% people out, but also due to Goodhart’s law where evaluating intelligent humans is confusing and difficult because the hiring/competition culture is so Moloch-filled in American society (as opposed to, say, dath ilan) that things get rough even when everyone is 100% on board with alignment.
If true, that indicates there is a bias against expanding talent funnels because of the emotional difficulty, making people estimate the value of talent funnels as lower than it actually is; in reality, the people facilitating the talent funnels should be aware of the mental health sacrifice they are making, and decide whether or not they choose to be grist for the X-risk mines.
MiguelDev 30 Aug 2023 6:17 UTC
3 points
2
The questions: Is this real or not? What, if anything, should anyone do, with this knowledge in hand?
I can attest to the validity of the premise you’re raising based on my own experience, but there are multiple factors at play. One is the scarcity of resources, which tends to limit opportunities to groups or teams that can demonstrably make effective use of those resources—whether it’s time, funds, or mentorship. Another factor that is less frequently discussed is the courage to deviate from conventional thinking. It’s inherently risky to invest in emerging alignment researchers who haven’t yet produced tangible results. Making such choices based on a gut feeling of their potential to deliver meaningful contributions can seem unreasonable, especially to funders. There are more layers to this, but nevertherless didn’t took any bad blood about what is the landscape. It is what it is.

No comments.