Governments are not known to change their policies based on carefully reasoned arguments, nor do they impose pro-active restrictions on a technology without an extensive track record of the technology having large negative side-effects. A big news-worthy event would need to happen in order for governments to take the sort of actions that could have a meaningful impact on AI timelines, something basically on the scale of 9/11 or larger.
I think steering capabilities research in directions that are likely to yield “survivable first strikes” would be very good and could create common knowledge about the necessity of alignment research. I think GPT-3 derivatives have potential here, they are sort of capped in terms of capability by being trained to mimic human output, yet they’re strong enough that a version unleashed on the internet could cause enough survivable harm to be obvious. Basically we need to maximise the distance between “model strong enough to cause survivable harm” and “model strong enough to wipe out humanity” in order to give humanity time to respond after the coordination-inducing-event.
It would need to be the sort of harm that is highly visible and concentrated in time and space, like 9/11, and not like “increasing the incidence of cancer worldwide by 10%” or “driving people insane with polarizing news feed”.
A thought: could we already have a case study ready for us?
Governments around the world are talking about regulating tech platforms. Arguably Facebook’s News Feed is an AI system and the current narrative is that it’s causing mass societal harm due to it optimizing for clicks/likes/time on Facebook/whatever rather than human values.
All we’d have to do is to convince people that this is actually an AI alignment problem.
That’s gonna be really hard, people like Yann lecun (head of Facebook AI) see these problems as evidence that alignment is actually easy. “See, there was a problem with the algorithm, we noticed it and we fixed it, what are you so worried about? This is just a normal engineering problem to be solved with normal engineering means.” Convincing them that this is actually an early manifestation of a fundamental difficulty that becomes deadly at high capability levels will be really hard.
Do we have to convince Yann LeCun? Or do we have to convince governments and the public?
(Though I agree that the word “All” is doing a lot of work in that sentence, and that convincing people of this may be hard. But possibly easier than actually solving the alignment problem?)
Governments are not known to change their policies based on carefully reasoned arguments, nor do they impose pro-active restrictions on a technology without an extensive track record of the technology having large negative side-effects. A big news-worthy event would need to happen in order for governments to take the sort of actions that could have a meaningful impact on AI timelines, something basically on the scale of 9/11 or larger.
I think steering capabilities research in directions that are likely to yield “survivable first strikes” would be very good and could create common knowledge about the necessity of alignment research. I think GPT-3 derivatives have potential here, they are sort of capped in terms of capability by being trained to mimic human output, yet they’re strong enough that a version unleashed on the internet could cause enough survivable harm to be obvious. Basically we need to maximise the distance between “model strong enough to cause survivable harm” and “model strong enough to wipe out humanity” in order to give humanity time to respond after the coordination-inducing-event.
It would need to be the sort of harm that is highly visible and concentrated in time and space, like 9/11, and not like “increasing the incidence of cancer worldwide by 10%” or “driving people insane with polarizing news feed”.
A thought: could we already have a case study ready for us?
Governments around the world are talking about regulating tech platforms. Arguably Facebook’s News Feed is an AI system and the current narrative is that it’s causing mass societal harm due to it optimizing for clicks/likes/time on Facebook/whatever rather than human values.
See also:
This story about how Facebook engineers tried to make tweaks to the News Feed algorithm’s utility function and it backfired.
This story about how Reddit’s recommendation algorithms may have influenced some of the recent stock market craziness.
All we’d have to do is to convince people that this is actually an AI alignment problem.
That’s gonna be really hard, people like Yann lecun (head of Facebook AI) see these problems as evidence that alignment is actually easy. “See, there was a problem with the algorithm, we noticed it and we fixed it, what are you so worried about? This is just a normal engineering problem to be solved with normal engineering means.” Convincing them that this is actually an early manifestation of a fundamental difficulty that becomes deadly at high capability levels will be really hard.
Do we have to convince Yann LeCun? Or do we have to convince governments and the public?
(Though I agree that the word “All” is doing a lot of work in that sentence, and that convincing people of this may be hard. But possibly easier than actually solving the alignment problem?)
That’s how you turn a technical field into a cesspit of social commentary and political virtue signaling.
Think less AGI-Overwatch committee or GPU-export ban and more “Big business bad!”, “AI racist!”, “Human greed the real problem!”