I won’t be responding to this thread until after the competition ends. If you’re not sure about something in the question, assume what (you think) I would assume. Feel free to argue for a specific interpretation in a comment if you think it’s underspecified.
Why I chose this question
In any realistic plan that ends with “and then we deployed the AI knowing we had mitigated risk X”, it seems to me like we need prestigious AI researchers to have some amount of consensus that X was actually a real risk. If you want a company to use a safer technique, the key decision-makers at the company will need to believe that it’s necessary in order to “pay the alignment tax”, and their opinions will be shaped by the higher-up AI researchers at the company. If you want a government to regulate AI, you need to convince the government that X is a real risk, and the government will probably defer to prestigious AI experts.
So it seems like an important variable is whether the AI community agrees that X is a real risk. (Also obviously whether in reality X is a real risk, but I’m assuming that it is for now.) It may be fine if it’s only the AGI researchers—if a company knows it can build AGI, it probably ignores the opinions of people who think we can’t build AGI.
Hence, this question. While an answer to the explicit question would be interesting to me, I’m more excited to have better models of what influences the answer to the question.
Prior reasoning
Copied over from the Google Doc linked above. Written during the 30 minutes I had to come up with a prior.
Seems very unlikely that if I ran this survey now the fraction would be 0.5. Let’s give that 0.1%, which I can effectively ignore.
My timelines for “roughly human-level reasoning” have a median of let’s say 20 years; it seems likely we get significant warning shots a few years before human-level reasoning, that lead to increased interest in safety. It’s not crazy that we get a warning shot in the next year from e.g. GPT-3, though I’d be pretty surprised, call it ~1%.
My model for how we get to consensus is that there’s a warning shot, or some excellent paper, or a series of these sorts of things, that increases the prominence of safety concerns, especially among “influencers”, then this percolates to the general AI community over the next few years. I do think it can percolate quite quickly, see e.g. the fairly large effects of Concrete Problems in AI safety or how quickly fairness + bias became mainstream. (That’s fewer examples than I’d like...) So let’s expect 1 − 10 years after the first warning shot. Given median timelines of 20 years + warning shot shortly before then + 1-10 years to reach consensus, probably median for this question should also be ~20 years? (Note even if we build human-level reasoning before a majority is reached, the question could still resolve positively after that, since human-level reasoning != AI researchers are out of a job)
There’s also a decent chance that we don’t get a significant warning shot before superintelligent AI (maybe 10% e.g. via fast takeoff or good safety techniques that don’t scale to AGI), or that not enough people update on it / consensus building takes forever / the population I chose just doesn’t pay attention to safety for some reason (maybe another 15%), so let’s say 25% that it never happens. Also another ~10% for AGI / warning shots not happening before 2100, or the safety community disappearing before 2100, etc. So that’s 35% on >2100 (which includes never).
So 25% for never leaves 75% for “at some point” of which I argued the median is 20 years, so I should have ~35% from now till 20 years, and 30% from 20 years till 2100 (since it’s 10% on >2100 but not never).
Then I played around with Elicit until I got a distribution that fit these constraints and looked vaguely lognormal.
Some information:
I don’t intend to engage
I won’t be responding to this thread until after the competition ends. If you’re not sure about something in the question, assume what (you think) I would assume. Feel free to argue for a specific interpretation in a comment if you think it’s underspecified.
Why I chose this question
In any realistic plan that ends with “and then we deployed the AI knowing we had mitigated risk X”, it seems to me like we need prestigious AI researchers to have some amount of consensus that X was actually a real risk. If you want a company to use a safer technique, the key decision-makers at the company will need to believe that it’s necessary in order to “pay the alignment tax”, and their opinions will be shaped by the higher-up AI researchers at the company. If you want a government to regulate AI, you need to convince the government that X is a real risk, and the government will probably defer to prestigious AI experts.
So it seems like an important variable is whether the AI community agrees that X is a real risk. (Also obviously whether in reality X is a real risk, but I’m assuming that it is for now.) It may be fine if it’s only the AGI researchers—if a company knows it can build AGI, it probably ignores the opinions of people who think we can’t build AGI.
Hence, this question. While an answer to the explicit question would be interesting to me, I’m more excited to have better models of what influences the answer to the question.
Prior reasoning
Copied over from the Google Doc linked above. Written during the 30 minutes I had to come up with a prior.
Seems very unlikely that if I ran this survey now the fraction would be 0.5. Let’s give that 0.1%, which I can effectively ignore.
My timelines for “roughly human-level reasoning” have a median of let’s say 20 years; it seems likely we get significant warning shots a few years before human-level reasoning, that lead to increased interest in safety. It’s not crazy that we get a warning shot in the next year from e.g. GPT-3, though I’d be pretty surprised, call it ~1%.
My model for how we get to consensus is that there’s a warning shot, or some excellent paper, or a series of these sorts of things, that increases the prominence of safety concerns, especially among “influencers”, then this percolates to the general AI community over the next few years. I do think it can percolate quite quickly, see e.g. the fairly large effects of Concrete Problems in AI safety or how quickly fairness + bias became mainstream. (That’s fewer examples than I’d like...) So let’s expect 1 − 10 years after the first warning shot. Given median timelines of 20 years + warning shot shortly before then + 1-10 years to reach consensus, probably median for this question should also be ~20 years? (Note even if we build human-level reasoning before a majority is reached, the question could still resolve positively after that, since human-level reasoning != AI researchers are out of a job)
There’s also a decent chance that we don’t get a significant warning shot before superintelligent AI (maybe 10% e.g. via fast takeoff or good safety techniques that don’t scale to AGI), or that not enough people update on it / consensus building takes forever / the population I chose just doesn’t pay attention to safety for some reason (maybe another 15%), so let’s say 25% that it never happens. Also another ~10% for AGI / warning shots not happening before 2100, or the safety community disappearing before 2100, etc. So that’s 35% on >2100 (which includes never).
So 25% for never leaves 75% for “at some point” of which I argued the median is 20 years, so I should have ~35% from now till 20 years, and 30% from 20 years till 2100 (since it’s 10% on >2100 but not never).
Then I played around with Elicit until I got a distribution that fit these constraints and looked vaguely lognormal.