I don’t think that the Overton Window for AI risk widening is a good thing, primarily because of the fact that we overrate bad news compared to good news, and there’s actual progress on AGI Alignment, and I expect people to take away a more negative view of AGI development than it’s warranted.
Here’s a link to the problem of negative news overwhelming positive news:
In general, I think that we will almost certainly make it through AGI and ASI via aligning them. And in particular, the amount of capabilities compared to safety progress is arguably suboptimal, in that there’s slack, which is we could accelerate AGI progress more and largely end up still alignment can be achieved.
I said elsewhere earlier: “AGI has the power to destroy the entire human race, and if we believe there’s even a 1% chance that it will, then we have to treat it as an absolute certainty.”
And I’m pretty sure that no expert puts it below 1%
I feel similarly, and what confuses me is that I had a positive view of AI safety back when it was about being pro-safety, pro-alignment, pro-interpretability, etc. These are good things that were neglected, and it felt good that there were people pushing for them.
But at some point it changed, becoming more about fear and opposition to progress. Anti-open source (most obviously with OpenAI, but even OpenRAIL isn’t OSI), anti-competition (via regulatory capture), anti-progress (via as-yet-unspecified means). I hadn’t appreciated the sheer darkness of the worldview.
And now, with the mindshare the movement has gained among the influential, I wonder what if it succeeds. What if open source AI models are banned, competitors to OpenAI are banned, and OpenAI decides to stop with GPT-4? It’s a little hard to imagine all that, but nuclear power was killed off in a vaguely analogous way.
Pondering the ensuing scenarios isn’t too pleasant. Does AGI get developed anyway, perhaps by China or by some military project during WW3? (I’d rather not either, please.) Or does humanity fully cooperate to put itself in a sort of technological stasis, with indefinite end?
it cannot occur as long as GPUs exist. we’re getting hard foom within 8 years, rain or shine, as far as I can tell; most likely within 4, if we do nothing to stabilize then it’s within 2, if we keep pushing hard then it’ll be this year.
let’s not do it this year. the people most able to push towards hard foom won’t need to rush as much if folks slow down. We’ve got some cool payouts from ai, let’s chill out slightly for a bit. just a little—there’s lots of fun capabilities stuff to do that doesn’t push towards explosive criticality, and most of the coolest capability stuff (in my view) makes things easier, not harder, to check for correctness.
The alignment, safety and interpretability is continuing at full speed, but if all the efforts of the alignment community are sufficient to get enough of this to avoid the destruction of the world in 2042, and AGI is created in 2037, then at the end you get a destroyed world.
It might not be possible in real life (List of Lethalities: “we can’t just decide not to build AGI”), and even if possible it might not be tractable enough to be worth focusing any attention on, but it would be nice if there was some way to make sure that AGI happens after alignment is sufficient at full speed (EDIT: or, failing that, to happen later, so if alignment goes quickly that takes the world from bad outcomes to good outcomes, instead of bad outcomes to bad outcomes).
The good news is that I expect AI development to be de facto open if not de jure openfor the following reason:
AI labs still need to publish enough at at least a high-level summary or abstraction level to succeed in the marketplace and politically.
OpenAI (et al.) could try and force as much about the actual functioning, work-performing details of the engineering design of their models into low-level implementation details that remain closed-source, with the intent to base their design on principles that make such a strategy more feasible. But I believe that this will not work.
This has to do with more fundamental reasons on how successful AI models have to actually be structured, such that even their high-level, abstract summaries of how they work must reliably map to the reasons that the model performs well (this is akin to the Natural Abstraction hypothesis).
Therefore, advanced AI models could in principle be feasibly reverse-engineered or re-developed simply from the implementation details that are published.
Yeah, even if there has been that kind of progress in alignment, I don’t see anyone publicizing that 50% of experts giving a 10% chance of existential catastrophe is an improvement over what the situation was before they started reporting on it. I don’t think they could tell that story even if they wanted to, not in a way that would actually educate the population generally.
I don’t think that the Overton Window for AI risk widening is a good thing, primarily because of the fact that we overrate bad news compared to good news, and there’s actual progress on AGI Alignment, and I expect people to take away a more negative view of AGI development than it’s warranted.
Here’s a link to the problem of negative news overwhelming positive news:
https://www.vox.com/the-highlight/23596969/bad-news-negativity-bias-media
In general, I think that we will almost certainly make it through AGI and ASI via aligning them. And in particular, the amount of capabilities compared to safety progress is arguably suboptimal, in that there’s slack, which is we could accelerate AGI progress more and largely end up still alignment can be achieved.
I said elsewhere earlier: “AGI has the power to destroy the entire human race, and if we believe there’s even a 1% chance that it will, then we have to treat it as an absolute certainty.”
And I’m pretty sure that no expert puts it below 1%
I feel similarly, and what confuses me is that I had a positive view of AI safety back when it was about being pro-safety, pro-alignment, pro-interpretability, etc. These are good things that were neglected, and it felt good that there were people pushing for them.
But at some point it changed, becoming more about fear and opposition to progress. Anti-open source (most obviously with OpenAI, but even OpenRAIL isn’t OSI), anti-competition (via regulatory capture), anti-progress (via as-yet-unspecified means). I hadn’t appreciated the sheer darkness of the worldview.
And now, with the mindshare the movement has gained among the influential, I wonder what if it succeeds. What if open source AI models are banned, competitors to OpenAI are banned, and OpenAI decides to stop with GPT-4? It’s a little hard to imagine all that, but nuclear power was killed off in a vaguely analogous way.
Pondering the ensuing scenarios isn’t too pleasant. Does AGI get developed anyway, perhaps by China or by some military project during WW3? (I’d rather not either, please.) Or does humanity fully cooperate to put itself in a sort of technological stasis, with indefinite end?
it cannot occur as long as GPUs exist. we’re getting hard foom within 8 years, rain or shine, as far as I can tell; most likely within 4, if we do nothing to stabilize then it’s within 2, if we keep pushing hard then it’ll be this year.
let’s not do it this year. the people most able to push towards hard foom won’t need to rush as much if folks slow down. We’ve got some cool payouts from ai, let’s chill out slightly for a bit. just a little—there’s lots of fun capabilities stuff to do that doesn’t push towards explosive criticality, and most of the coolest capability stuff (in my view) makes things easier, not harder, to check for correctness.
The alignment, safety and interpretability is continuing at full speed, but if all the efforts of the alignment community are sufficient to get enough of this to avoid the destruction of the world in 2042, and AGI is created in 2037, then at the end you get a destroyed world.
It might not be possible in real life (List of Lethalities: “we can’t just decide not to build AGI”), and even if possible it might not be tractable enough to be worth focusing any attention on, but it would be nice if there was some way to make sure that AGI happens after alignment is sufficient at full speed (EDIT: or, failing that, to happen later, so if alignment goes quickly that takes the world from bad outcomes to good outcomes, instead of bad outcomes to bad outcomes).
The good news is that I expect AI development to be de facto open if not de jure open for the following reason:
AI labs still need to publish enough at at least a high-level summary or abstraction level to succeed in the marketplace and politically.
OpenAI (et al.) could try and force as much about the actual functioning, work-performing details of the engineering design of their models into low-level implementation details that remain closed-source, with the intent to base their design on principles that make such a strategy more feasible. But I believe that this will not work.
This has to do with more fundamental reasons on how successful AI models have to actually be structured, such that even their high-level, abstract summaries of how they work must reliably map to the reasons that the model performs well (this is akin to the Natural Abstraction hypothesis).
Therefore, advanced AI models could in principle be feasibly reverse-engineered or re-developed simply from the implementation details that are published.
Yeah, even if there has been that kind of progress in alignment, I don’t see anyone publicizing that 50% of experts giving a 10% chance of existential catastrophe is an improvement over what the situation was before they started reporting on it. I don’t think they could tell that story even if they wanted to, not in a way that would actually educate the population generally.