I’m curious as to whether or not the rationalsphere/AI risk community has ever experimented with hiring people to work on serious technical problems who aren’t fully aligned with the values of the community or not fully invested in it already. It seems like ideological alignment is a major bottleneck to locating and attracting relevant skill levels and productivity levels, and there might be some benefit to being open about tradeoffs that favor skill and productivity at the expense of not being completely committed to solving AI risk.
Not sure I agree. When I started working on this topic, I wasn’t much invested in it, just wanted to get attention on LW. Wei has admitted to a similar motivation.
I think the right way to encourage work on AI alignment is by offering mainstream incentives (feedback, popularity, citations, money). You don’t have to think of it as hiring, it can be lower pressure. Give them a grant, publish them in your journal, put them on the front page of your website, or just talk about their ideas. If they aren’t thinking in the right direction, you have plenty of opportunity to set the terms of the game.
Not saying this is a panacea, but I feel like many people here focus too much on removing barriers and too little on creating incentives.
I can’t remember the key bit of Ben’s post (and wasn’t able to find it quickly on skimming). But my hot take is:
It seems obviously net-positive for contributions on x-risk to be rewarded with status within the AI Safety community, but it’s not obviously net-positive for those contributions to be rewarded with serious money or status in the broader world.
If the latter gets too large, then you start getting swarmed with people who want money and prestige but don’t necessarily understand how to contibute, who are incentivized to degrade the signal of what’s actually important.
To quote a conversation with habryka: there are two ways to make AI Safety prestigious. The first way is to make serious AI Safety work (i.e. solving the alignment problem) prestigious. The second is to change the definition of AI Safety to be something more obviously prestigious in the first place (which may get you things like ‘solving problems with self-driving cars’). And the latter is often easier to do.
So, if you’re making it easier for not-fully-aligned people to join the movement, motivated by prestige, they’ll want to start changing the movement to make it easier for them to get ahead.
This isn’t to say this obviously cashes out to “net-negative” either, just that it’s something to be aware of.
Principal-agent problems certainly matter! But despite that, collaboration based on extrinsic rewards (instead of selfless agreement on every detail) has been a huge success story for mankind. Is our task unusually prone to principal-agent problems, compared to other tasks? In my experience, the opposite is true: AI alignment research is unusually easy to evaluate in detail, compared to checking the work of a contractor building your house or a programmer writing code for your company.
If the latter gets too large, then you start getting swarmed with people who want money and prestige but don’t necessarily understand how to contibute, who are incentivized to degrade the signal of what’s actually important.
During this decade the field of AI in general became one of the most prestigious and high-status academic fields to work in. But as far as I can tell, it hasn’t slowed down the rate of progress in advancing AI capability. If anything, it has sped it up—by quite a bit. It’s possible that a lot of newcomers to the field are largely driven by the prospect of status gain and money. And there are quite a few “AI” hype-driven startups that have popped up and seem doomed to fail, but despite this, it doesn’t seem to be slowing the pace of the most productive research groups. Maybe the key here is that if you suddenly increase the prestige of a scientific field by a dramatic amount, you are bound to get a lot of nonsense or fraudulent activity, but this might be constrained to being outside of serious research circles. And the most serious people working in the field are likely to be helped by the rising tide as well, due to increased visibility and funding to their labs and so on.
It’s also my understanding that the last few years (during the current AI boom) have been some of the most successful (financially and productively) for MIRI in their entire history.
I don’t think we disagree. The top-level comment said
experimented with hiring people to work on serious technical problems
and you said
You don’t have to think of it as hiring
So you were talking past each other a bit.
Also, you said:
When I started working on this topic, I wasn’t much invested in it, just wanted to get attention on LW. Wei has admitted to a similar motivation...
I feel like many people here focus too much on removing barriers and too little on creating incentives.
I think we can create a strong incentive landscape on LW for valuable work to be done (in alignment and in other areas); it’s just very important to get it right, and to not build something that can easily fall prey to adversarial goodheart (or even just plain old regressional goodheart). I’m very pro creating incentives (have been thinking a lot lately about how to do that, have got a few ideas that I think is good for this, will write it up for feedback from y’all when I get a chance).
I’m curious as to whether or not the rationalsphere/AI risk community has ever experimented with hiring people to work on serious technical problems who aren’t fully aligned with the values of the community or not fully invested in it already. It seems like ideological alignment is a major bottleneck to locating and attracting relevant skill levels and productivity levels, and there might be some benefit to being open about tradeoffs that favor skill and productivity at the expense of not being completely committed to solving AI risk.
I wrote a post with some reasons to be skeptical of this.
Not sure I agree. When I started working on this topic, I wasn’t much invested in it, just wanted to get attention on LW. Wei has admitted to a similar motivation.
I think the right way to encourage work on AI alignment is by offering mainstream incentives (feedback, popularity, citations, money). You don’t have to think of it as hiring, it can be lower pressure. Give them a grant, publish them in your journal, put them on the front page of your website, or just talk about their ideas. If they aren’t thinking in the right direction, you have plenty of opportunity to set the terms of the game.
Not saying this is a panacea, but I feel like many people here focus too much on removing barriers and too little on creating incentives.
I can’t remember the key bit of Ben’s post (and wasn’t able to find it quickly on skimming). But my hot take is:
It seems obviously net-positive for contributions on x-risk to be rewarded with status within the AI Safety community, but it’s not obviously net-positive for those contributions to be rewarded with serious money or status in the broader world.
If the latter gets too large, then you start getting swarmed with people who want money and prestige but don’t necessarily understand how to contibute, who are incentivized to degrade the signal of what’s actually important.
To quote a conversation with habryka: there are two ways to make AI Safety prestigious. The first way is to make serious AI Safety work (i.e. solving the alignment problem) prestigious. The second is to change the definition of AI Safety to be something more obviously prestigious in the first place (which may get you things like ‘solving problems with self-driving cars’). And the latter is often easier to do.
So, if you’re making it easier for not-fully-aligned people to join the movement, motivated by prestige, they’ll want to start changing the movement to make it easier for them to get ahead.
This isn’t to say this obviously cashes out to “net-negative” either, just that it’s something to be aware of.
Principal-agent problems certainly matter! But despite that, collaboration based on extrinsic rewards (instead of selfless agreement on every detail) has been a huge success story for mankind. Is our task unusually prone to principal-agent problems, compared to other tasks? In my experience, the opposite is true: AI alignment research is unusually easy to evaluate in detail, compared to checking the work of a contractor building your house or a programmer writing code for your company.
During this decade the field of AI in general became one of the most prestigious and high-status academic fields to work in. But as far as I can tell, it hasn’t slowed down the rate of progress in advancing AI capability. If anything, it has sped it up—by quite a bit. It’s possible that a lot of newcomers to the field are largely driven by the prospect of status gain and money. And there are quite a few “AI” hype-driven startups that have popped up and seem doomed to fail, but despite this, it doesn’t seem to be slowing the pace of the most productive research groups. Maybe the key here is that if you suddenly increase the prestige of a scientific field by a dramatic amount, you are bound to get a lot of nonsense or fraudulent activity, but this might be constrained to being outside of serious research circles. And the most serious people working in the field are likely to be helped by the rising tide as well, due to increased visibility and funding to their labs and so on.
It’s also my understanding that the last few years (during the current AI boom) have been some of the most successful (financially and productively) for MIRI in their entire history.
This is an interesting point I hadn’t considered. Still mulling it over a bit.
I don’t think we disagree. The top-level comment said
and you said
So you were talking past each other a bit.
Also, you said:
I think we can create a strong incentive landscape on LW for valuable work to be done (in alignment and in other areas); it’s just very important to get it right, and to not build something that can easily fall prey to adversarial goodheart (or even just plain old regressional goodheart). I’m very pro creating incentives (have been thinking a lot lately about how to do that, have got a few ideas that I think is good for this, will write it up for feedback from y’all when I get a chance).