Generally, I don’t see why we should expect that the most capable systems that can be created with supervised learning (e.g. by using RL to search over an arbitrary space of NN architectures) would perform similarly to the most capable systems that can be created, at around the same time, using some restricted supervised learning that humans must trust to be safe. My prior is that the former is very likely to outperform by a lot, and I’m not aware of strong evidence pointing one way or another.
This seems similar to my view, which is that if you try to optimize for just one thing (efficiency) you’re probably going to end up with more of that thing than if you try to optimize for two things at the same time (efficiency and safety) or if you try to optimize for that thing under a heavy constraint (i.e., safety).
But there are people (like Paul) who seem to be more optimistic than this based on more detailed inside-view intuitions, which makes me wonder if I should defer to them. If the answer is no, there’s also the question of how do we make policy makers take this problem seriously (i.e., that safe AI probably won’t be as efficient as unsafe AI) given the existence of more optimistic AI safety researchers, so that they’d be willing to undertake costly preparations for governance solutions ahead of time. By the time we get conclusive evidence one way or another, it may be too late to make such preparations.
If the answer is no, there’s also the question of how do we make policy makers take this problem seriously (i.e., that safe AI probably won’t be as efficient as unsafe AI) given the existence of more optimistic AI safety researchers (so that they’d be willing to undertake costly preparations for governance solutions ahead of time).
I’m not aware of any AI safety researchers that are extremely optimistic about solving alignment competitively. I think most of them are just skeptical about the feasibility of governance solutions, or think governance related interventions might be necessary but shouldn’t be carried out yet.
In this 80,000 Hours podcast episode, Paul said the following:
In terms of the actual value of working on AI safety, I think the biggest concern is this, “Is this an easy problem that will get solved anyway?” Maybe the second biggest concern is, “Is this a problem that’s so difficult that one shouldn’t bother working on it or one should be assuming that we need some other approach?” You could imagine, the technical problem is hard enough that almost all the bang is going to come from policy solutions rather than from technical solutions.
And you could imagine, those two concerns maybe sound contradictory, but aren’t necessarily contradictory, because you could say, “We have some uncertainty about this parameter of how hard this problem is.” Either it’s going to be easy enough that it’s solved anyway, or it’s going to be hard enough that working on it now isn’t going to help that much and so what mostly matters is getting our policy response in order. I think I don’t find that compelling, in part because one, I think the significant probability on the range … like the place in between those, and two, I just think working on this problem earlier will tell us what’s going on. If we’re in the world where you need a really drastic policy response to cope with this problem, then you want to know that as soon as possible.
It’s not a good move to be like, “We’re not going to work on this problem because if it’s serious, we’re going to have a dramatic policy response.” Because you want to work on it earlier, discover that it seems really hard and then have significantly more motivation for trying the kind of coordination you’d need to get around it.
I’m not aware of any AI safety researchers that are extremely optimistic about solving alignment competitively.
I’m not sure what you’d consider “extremely” optimistic, but I gathered some quantitative estimates of AI risk here, and they all seem overly optimistic to me. Did you see that?
Paul: I just think working on this problem earlier will tell us what’s going on. If we’re in the world where you need a really drastic policy response to cope with this problem, then you want to know that as soon as possible.
I agree with this motivation to do early work, but in a world where we do need drastic policy responses, I think it’s pretty likely that the early work won’t actually produce conclusive enough results to show that. For example, if a safety approach fails to make much progress, there’s not really a good way to tell if it’s because safe and competitive AI really is just too hard (and therefore we need a drastic policy response), or because the approach is wrong, or the people working on it aren’t smart enough, or they’re trying to do the work too early. People who are inclined to be optimistic will probably remain so until it’s too late.
but I gathered some quantitative estimates of AI risk here, and they all seem overly optimistic to me. Did you see that?
I only now read that thread. I think it is extremely worthwhile to gather such estimates.
I think all the three estimates mentioned there correspond to marginal probabilities (rather than probabilities conditioned on “no governance interventions”). So those estimates already account for scenarios in which governance interventions save the world. Therefore, it seems we should not strongly update against the necessity of governance interventions due to those estimates being optimistic.
Maybe we should gather researchers’ credences for predictions like: ”If there will be no governance interventions, competitive aligned AIs will exist in 10 years from now”.
I suspect that gathering such estimates from publicly available information might expose us to a selection bias, because very pessimistic estimates might be outside the Overton window (even for the EA/AIS crowd). For example, if Robert Wiblin would have concluded that an AI existential catastrophe is 50% likely, I’m not sure that the 80,000 Hours website (which targets a large and motivationally diverse audience) would have published that estimate.
I agree with this motivation to do early work, but in a world where we do need drastic policy responses, I think it’s pretty likely that the early work won’t actually produce conclusive enough results to show that. For example, if a safety approach fails to make much progress, there’s not really a good way to tell if it’s because safe and competitive AI really is just too hard (and therefore we need a drastic policy response), or because the approach is wrong, or the people working on it aren’t smart enough, or they’re trying to do the work too early.
I think all the three estimates mentioned there correspond to marginal probabilities (rather than probabilities conditioned on “no governance interventions”). So those estimates already account for scenarios in which governance interventions save the world. Therefore, it seems we should not strongly update against the necessity of governance interventions due to those estimates being optimistic
I normally give ~50% as my probability we’d be fine without any kind of coordination.
Upvoted for giving this number, but what does it mean exactly? You expect “50% fine” through all kinds of x-risk, assuming no coordination from now until the end of the universe? Or just assuming no coordination until AGI? Is it just AI risk instead of all x-risk, or just risk from narrow AI alignment? If “AI risk”, are you including risks from AI exacerbating human safety problems, or AI differentially accelerating dangerous technologies? Is it 50% probability that humanity survives (which might be “fine” to some people) or 50% that we end up with a nearly optimal universe? Do you have a document that gives all of your quantitative risk estimates with clear explanations of what they mean?
(Sorry to put you on the spot here when I haven’t produced anything like that myself, but I just want to convey how confusing all this is.)
This seems similar to my view, which is that if you try to optimize for just one thing (efficiency) you’re probably going to end up with more of that thing than if you try to optimize for two things at the same time (efficiency and safety) or if you try to optimize for that thing under a heavy constraint (i.e., safety).
But there are people (like Paul) who seem to be more optimistic than this based on more detailed inside-view intuitions, which makes me wonder if I should defer to them. If the answer is no, there’s also the question of how do we make policy makers take this problem seriously (i.e., that safe AI probably won’t be as efficient as unsafe AI) given the existence of more optimistic AI safety researchers, so that they’d be willing to undertake costly preparations for governance solutions ahead of time. By the time we get conclusive evidence one way or another, it may be too late to make such preparations.
I’m not aware of any AI safety researchers that are extremely optimistic about solving alignment competitively. I think most of them are just skeptical about the feasibility of governance solutions, or think governance related interventions might be necessary but shouldn’t be carried out yet.
In this 80,000 Hours podcast episode, Paul said the following:
I’m not sure what you’d consider “extremely” optimistic, but I gathered some quantitative estimates of AI risk here, and they all seem overly optimistic to me. Did you see that?
I agree with this motivation to do early work, but in a world where we do need drastic policy responses, I think it’s pretty likely that the early work won’t actually produce conclusive enough results to show that. For example, if a safety approach fails to make much progress, there’s not really a good way to tell if it’s because safe and competitive AI really is just too hard (and therefore we need a drastic policy response), or because the approach is wrong, or the people working on it aren’t smart enough, or they’re trying to do the work too early. People who are inclined to be optimistic will probably remain so until it’s too late.
I only now read that thread. I think it is extremely worthwhile to gather such estimates.
I think all the three estimates mentioned there correspond to marginal probabilities (rather than probabilities conditioned on “no governance interventions”). So those estimates already account for scenarios in which governance interventions save the world. Therefore, it seems we should not strongly update against the necessity of governance interventions due to those estimates being optimistic.
Maybe we should gather researchers’ credences for predictions like:
”If there will be no governance interventions, competitive aligned AIs will exist in 10 years from now”.
I suspect that gathering such estimates from publicly available information might expose us to a selection bias, because very pessimistic estimates might be outside the Overton window (even for the EA/AIS crowd). For example, if Robert Wiblin would have concluded that an AI existential catastrophe is 50% likely, I’m not sure that the 80,000 Hours website (which targets a large and motivationally diverse audience) would have published that estimate.
I strongly agree with all of this.
I normally give ~50% as my probability we’d be fine without any kind of coordination.
Upvoted for giving this number, but what does it mean exactly? You expect “50% fine” through all kinds of x-risk, assuming no coordination from now until the end of the universe? Or just assuming no coordination until AGI? Is it just AI risk instead of all x-risk, or just risk from narrow AI alignment? If “AI risk”, are you including risks from AI exacerbating human safety problems, or AI differentially accelerating dangerous technologies? Is it 50% probability that humanity survives (which might be “fine” to some people) or 50% that we end up with a nearly optimal universe? Do you have a document that gives all of your quantitative risk estimates with clear explanations of what they mean?
(Sorry to put you on the spot here when I haven’t produced anything like that myself, but I just want to convey how confusing all this is.)