If the first researcher with the key insight into general AI is really “safety conscious” we don’t automatically get friendly AI first. That’s a 10x reduction in marginal value from the original model.
Being “safety conscious” correctly is really hard and most of the 30% won’t be safety conscious in the way we want, even though they “know” they should. That’s another 30x reduction in marginal value from the original model.
One big penalty that was discussed is the likelihood of another researcher having the key insight before the first researcher can leverage the insight into friendly AI. Throwing some crazy numbers down (aka a concrete albeit greatly simplified model), call it a 1% “no one would possibly think of this before FAI”, 10% “50% someone else will think of this in time to beat me if they’re unfriendly”, 89% “this is idea whose time has come, we save a couple years on the first friendly researcher, a year on the next two, and months on the rest” and call it some fraction of 50 years. That gives something like 0.02 + 0.065 + 0.048 = down by a factor of 10.
(Edited) Another factor is whether being “safety conscious” about your key insight actually ends up gaining us anything. e.g. telling a collaborator you thought was okay but wasn’t loses some of the gains. I haven’t thought through this but wouldn’t be viscerally upset if someone said anywhere from 10%-50% that being safety conscious works. (Edited) After reading Eliezer’s comment, I think I was confusing two things (and maybe others are). There’s a spectrum of safety consciousness, and I don’t think all of those 30% of researchers convinced by 100 papers get to the 10%-50% level of “safety consciousness from them will work”. Maybe 2% get to 50%, 10% get to 10%, and 88% get to 2% or worse aka 1%. That brings this factor down to 3%.
(Edited: this is a non-issue) There’s also the possibility of very negative consequences to buying up the bright grad students (I assume we need the bright ones to get good papers produced in the right journals). I don’t know if this is actually any concern at all to those with at least some intuition into the matter—I have no such intuition. (Edited) This came from my thought: “if it was generally well-known that a relatively small group of people was trying to buy up 30% of the AI research, might that cause a social backlash?” which is just flat-out wrong, we’re trying to write 100 papers to convince 30% of the community, not actually buy 30% of the research. :)
In the other direction there’s the possibility that “100 good papers” leads to “30% convinced” leads to “balloon upwards to 80% due to networking and no-longer-non-mainstream effects”. (Edited: if this happens, it gives us a factor of about 1.5, so its total contribution is pretty small unless it’s very likely)
Oh, and there’s the expected time until FAI as compared to GAI… if FAI is too much longer, we only get a benefit from the 1% piece of that model which would make it down by an extremely unstable (1% plus or minus 1% ;)) factor of 50. (Edited) Let’s put some crazy numbers down and say FAI being so much friggin’ harder than GAI is 25%, plus 10% FAI is actually just impossible, for a 35% chance we only get the benefits from the 1% (plus or minus 1%) piece of the “delaying general AI” model. My other intuitions were coming from FAI being really hard but not a century harder than GAI. This takes my original 0.02 + 0.065 + 0.048 = 0.133 down to 0.35 0.02 + 0.65 0.133 = 0.0935, which is still about a 10x factor.
Anyone with better intuitions/experience/(gasp)knowledge want to redo those numbers or note why one of the models is terribly broken or brainstorm other yet-unmentioned factors?
There’s also “safety conscious actually works” which I have no idea about but wouldn’t laugh at even up to 50%ish.
I don’t understand this sentence. Please explain.
There’s also the possibility of very negative consequences to buying up the bright grad students (I assume we need the bright ones to get good papers produced in the right journals).
Edited. I would guess that “being safety conscious” isn’t enough to guarantee good effects, and we only get some fraction of the benefit that an Ideal Safety Conscious Researcher would give.
The negative consequences I was thinking of are in retrospect based on a silly error.
Summary:
If the first researcher with the key insight into general AI is really “safety conscious” we don’t automatically get friendly AI first. That’s a 10x reduction in marginal value from the original model.
Being “safety conscious” correctly is really hard and most of the 30% won’t be safety conscious in the way we want, even though they “know” they should. That’s another 30x reduction in marginal value from the original model.
One big penalty that was discussed is the likelihood of another researcher having the key insight before the first researcher can leverage the insight into friendly AI. Throwing some crazy numbers down (aka a concrete albeit greatly simplified model), call it a 1% “no one would possibly think of this before FAI”, 10% “50% someone else will think of this in time to beat me if they’re unfriendly”, 89% “this is idea whose time has come, we save a couple years on the first friendly researcher, a year on the next two, and months on the rest” and call it some fraction of 50 years. That gives something like 0.02 + 0.065 + 0.048 = down by a factor of 10.
(Edited) Another factor is whether being “safety conscious” about your key insight actually ends up gaining us anything. e.g. telling a collaborator you thought was okay but wasn’t loses some of the gains. I haven’t thought through this but wouldn’t be viscerally upset if someone said anywhere from 10%-50% that being safety conscious works. (Edited) After reading Eliezer’s comment, I think I was confusing two things (and maybe others are). There’s a spectrum of safety consciousness, and I don’t think all of those 30% of researchers convinced by 100 papers get to the 10%-50% level of “safety consciousness from them will work”. Maybe 2% get to 50%, 10% get to 10%, and 88% get to 2% or worse aka 1%. That brings this factor down to 3%.
(Edited: this is a non-issue) There’s also the possibility of very negative consequences to buying up the bright grad students (I assume we need the bright ones to get good papers produced in the right journals). I don’t know if this is actually any concern at all to those with at least some intuition into the matter—I have no such intuition. (Edited) This came from my thought: “if it was generally well-known that a relatively small group of people was trying to buy up 30% of the AI research, might that cause a social backlash?” which is just flat-out wrong, we’re trying to write 100 papers to convince 30% of the community, not actually buy 30% of the research. :)
In the other direction there’s the possibility that “100 good papers” leads to “30% convinced” leads to “balloon upwards to 80% due to networking and no-longer-non-mainstream effects”. (Edited: if this happens, it gives us a factor of about 1.5, so its total contribution is pretty small unless it’s very likely)
Oh, and there’s the expected time until FAI as compared to GAI… if FAI is too much longer, we only get a benefit from the 1% piece of that model which would make it down by an extremely unstable (1% plus or minus 1% ;)) factor of 50. (Edited) Let’s put some crazy numbers down and say FAI being so much friggin’ harder than GAI is 25%, plus 10% FAI is actually just impossible, for a 35% chance we only get the benefits from the 1% (plus or minus 1%) piece of the “delaying general AI” model. My other intuitions were coming from FAI being really hard but not a century harder than GAI. This takes my original 0.02 + 0.065 + 0.048 = 0.133 down to 0.35 0.02 + 0.65 0.133 = 0.0935, which is still about a 10x factor.
Anyone with better intuitions/experience/(gasp)knowledge want to redo those numbers or note why one of the models is terribly broken or brainstorm other yet-unmentioned factors?
I don’t understand this sentence. Please explain.
What negative consequences?
Edited. I would guess that “being safety conscious” isn’t enough to guarantee good effects, and we only get some fraction of the benefit that an Ideal Safety Conscious Researcher would give.
The negative consequences I was thinking of are in retrospect based on a silly error.
Thanks for pointing those out!