In my opinion we just haven’t done a very good job.
I mean, I agree that we’ve failed at our goal. But “haven’t done a very good job” implies to me something like “it was possible to not fail”, which, unclear?
We’ve seen plenty of people jump on the AI safety bandwagon.
Jumping on the bandwagon isn’t the important thing. If anything, it’s made things somewhat worse; consider the reaction to this post if MIRI were ‘the only game in town’ for alignment research as opposed to ‘one lab out of half a dozen.’ “Well, MIRI’s given up,” someone might say, “good thing ARC, DeepMind, OpenAI, Anthropic, and others are still on the problem.” [That is, it’s easy to have a winner’s curse of sorts, where the people most optimistic about a direction will put in work on that direction for the longest, and so you should expect most work to be done by the overconfident instead of the underconfident.]
Like, if someone says “I work on AI safety, I make sure that people can’t use language or image generation models to make child porn” or “I work on AI safety, I make sure that algorithms don’t discriminate against underprivileged minorities”, I both believe 1) they are trying to make these systems not do a thing broader society disapproves of, which is making ‘AI’ more ‘safe’, and 2) this is not attacking the core problem, and will not generate inroads to attacking the core problem.
I think the core problem is technical, mathematical, and philosophical. We need to design systems that have lots of complicated properties, and we’re not quite sure what those properties are or whether it’s possible to combine them. The things that people are building look like they’re not going to have those properties. I think there are lots of things that coordination projects have achieved in the past, but none of them look like that.
We can get lots of people to show up at protests and chant slogans in unison. I’m not sure how this solves technical problems.
We can get lots of corporations to have marketing slogans that promote safety. I’m again not sure how this solves technical problems, or how it would be the case that they have ‘real decision-making power’ instead of something more like ‘make it less likely we get sued’.
We can get government regulation of AI, and that means that 1) there will be a race to the bottom across jurisdictions, unless they decide to take it as seriously as they take nuclear proliferation, and 2) the actual result will be companies need large compliance departments in order to develop AI systems, and those compliance departments won’t be able to tell the difference between dangerous and non-dangerous AI.
“I mean, I agree that we’ve failed at our goal. But “haven’t done a very good job” implies to me something like “it was possible to not fail”, which, unclear?”
Of course it was. Was it difficult? Certainly. So difficult that I don’t blame anyone for failing, like I’ve stated in my comment reply to this post.
It’s an extremely difficult problem both technically and politically/socially. The difference is that I don’t see any technical solutions, and have as well heard very convincing arguments by the likes of Roman Yalmpolskiy that such thing might not even exist. But we can all agree that there is at least one political solution—to not build advanced AIs before we’ve solved the alignment problem. No matter how extremely difficult such solution might seem, it actually exists and seems possible.
So we’ve failed, but I’m not blaming anyone because it’s damn difficult. In fact I have nothing but the deepest admiration for the likes of Eliezer, Bostrom and Russell. But my critique still stands: such failure (to get the leaders to care, not the technical failure to solve alignment) COULD be IN PART because most prominent figures like these 3 only talk about AI x-risk and not worse outcomes.
“unless they decide to take it as seriously as they take nuclear proliferation”
That’s precisely what we need. I’d assume that most in this community are quite solidly convinced that “AI is far more dangerous than nukes” (to quote our friend Elon). If leaders could adopt our reasoning, it could be done.
“the actual result will be companies need large compliance departments in order to develop AI systems, and those compliance departments won’t be able to tell the difference between dangerous and non-dangerous AI.”
There are other regulation alternatives. Like restricting access to supercomputers. Or even stoping AI research altogether until we’ve made much more progress on alignment. But your concern is still completely legitimate. But where’s the technical solutions in sight, as an alternative? Should we rather risk dying (again, that’s not even the worse risk) because political solutions seem intractable and only try technical solutions when those seem even way more intractable?
Contingency measures, both technical and political, could also be more effective than both full alignment and political solutions.
But my critique still stands: such failure (to get the leaders to care, not the technical failure to solve alignment) COULD be IN PART because most prominent figures like these 3 only talk about AI x-risk and not worse outcomes.
From the point of view of most humans, there are few outcomes worse than extinction of humanity (x-risk). Are you implying that most leaders would prefer extinction of humanity to some other likely outcome, and could be persuaded if we focused on that instead?
I strongly suspect there are some that do have such preferences, but I also think that those unpersuaded by the risk of extinction wouldn’t be persuaded by any other argument anyway.
“From the point of view of most humans, there are few outcomes worse than extinction of humanity (x-risk).”
That’s obviously not true. What would you prefer: extinction of humanity, or permanent Holocaust?
“Are you implying that most leaders would prefer extinction of humanity to some other likely outcome, and could be persuaded if we focused on that instead?”
Anyone would prefer extinction to say a permanent Holocaust. Anyone sane at least. But I’m not implying that they would prefer extinction to a positive outcome.
“but I also think that those unpersuaded by the risk of extinction wouldn’t be persuaded by any other argument anyway”
I’ll ask you again: which is worse, extinction or permanent Holocaust?
Note that I didn’t say that there are no outcomes that are worse than extinction. That said, I’m not convinced that permanent Holocaust is worse than permanent extinction, but that’s irrelevant to my point anyway. If someone isn’t convinced by the risk of permanent extinction, are you likely to convince them by the (almost certainly smaller) risk of permanent Holocaust instead?
“That said, I’m not convinced that permanent Holocaust is worse than permanent extinction, but that’s irrelevant to my point anyway.”
Maybe it’s not. What we guess are other people’s values is heavily influenced by our own values. And if you are not convinced that permanent Holocaust is worse than permanent extinction, then, no offense, but you have a very scary value system.
“If someone isn’t convinced by the risk of permanent extinction, are you likely to convince them by the (almost certainly smaller) risk of permanent Holocaust instead?”
Naturally, because the latter is orders of magnitude worse than the former. But again, if you don’t share this view, I can’t see myself convincing you.
And we also have no idea if it really is smaller. But even a small risk of an extremely bad outcome is reason for high alarm.
What exactly do you mean by permanent Holocaust? The way Wikipedia defines the Holocaust it’s about the genocide of Jewish people. Other sources include the genocide of groups like Sinti and Roma as well.
While genocide is very bad, human extinction includes most of the evils of genocide as well, so I would not prefer human extinction.
Everyone knows that the Holocaust wasn’t just genocide. It was also torture, evil medical experiments, etc. But you’re right, I should have used a better example. Not that I think that anyone really misunderstood what I meant.
“We can get lots of people to show up at protests and chant slogans in unison. I’m not sure how this solves technical problems.” -In the case that there is someone in the planet who could solve the alignment but still doesn’t know about the problem. If that is the case this can be one of the ways to find him/her/them (we must estimate the best probability of success in the path we take, and if them exist and where). Maybe involving more intellectual resources into the safety technical investigation, with a social mediatic campaign. And if it just accelerates the doom, weren’t we still doomed?
People should and deserve to know the probable future. A surviving timeline can come from a major social revolt, and a social revolt surging from the low probability of alignment is possible.
So we must evaluate prob of success:
1)By creating loudness and how. 2)Keep trying relatively silent.
And if it just accelerates the doom, weren’t we still doomed?
Again, unclear. Elsewhere on this post, Vanessa Kosoy comments that she thinks there’s a 30% chance of success. The more you accelerate the creation of unsafe AI, the less time Vanessa has to work, and so presumably the lower her chance is; perhaps attracting more talent helps, but past efforts in this direction don’t seem to have obviously been worth it.
Like, we talk a lot about ‘differential tech development’, or developing technology that makes it easier to create an aligned AGI more than it makes it easier to create an unaligned AGI. It would be really nice if “communicate to people about the risk” was this sort of thing—it really seemed like it should be, on first glance—but in retrospect I’m really not sure that it was, and moderately pessimistic about future efforts that aren’t carefully targeted.
People should and deserve to know the probable future.
I mean, it’s not like we hid the arguments; we posted them on the public internet, I got published in a magazine arguing about it, Eliezer went on the Sam Harris podcast and talked about it, Nick Bostrom wrote a book about it, went on Joe Rogan and talked about it, and talked to the UK government about it. Heck, someone (in an attempt to blackmail MIRI about LessWrong moderation decisions, as far as I can tell) convinced Glenn Beck that AGI was dangerous, mostly in an attempt to discredit it by association.
Like, I would be way more sympathetic to this line of argumentation if it engaged with past efforts and the various balances that everyone was trying to strike, rather than just saying “well, it didn’t work, have you tried more?”
Like, the Slaughterbots campaign was ‘more’, and most AI Alignment people I talk to think it was a bad idea because it probably alienated the military, and made it harder to win in worlds where the first close AGI project is a military project, without clearly delaying the timeline for either AGI or lethal autonomous weapons.
Similarly, I think you have to make a choice between ‘fomenting social revolt’ and ‘working with the technocratic apparatus’, and I haven’t seen much in the way of successful social revolt in developed countries recently. [If the Canadian trucker convoy had been protesting the introduction of autonomous trucks instead of vaccine mandates, would it have been any more successful? What if it was just ‘getting rid of algorithms’, which is what seems to be how simpler versions of these sorts of arguments come out when actually pushed thru the government?
You have to admit that getting rid of algorithms would be a great help though. We might still be allowed to do some on paper, but that surely couldn’t be too dangerous?
I mean, I agree that we’ve failed at our goal. But “haven’t done a very good job” implies to me something like “it was possible to not fail”, which, unclear?
Jumping on the bandwagon isn’t the important thing. If anything, it’s made things somewhat worse; consider the reaction to this post if MIRI were ‘the only game in town’ for alignment research as opposed to ‘one lab out of half a dozen.’ “Well, MIRI’s given up,” someone might say, “good thing ARC, DeepMind, OpenAI, Anthropic, and others are still on the problem.” [That is, it’s easy to have a winner’s curse of sorts, where the people most optimistic about a direction will put in work on that direction for the longest, and so you should expect most work to be done by the overconfident instead of the underconfident.]
Like, if someone says “I work on AI safety, I make sure that people can’t use language or image generation models to make child porn” or “I work on AI safety, I make sure that algorithms don’t discriminate against underprivileged minorities”, I both believe 1) they are trying to make these systems not do a thing broader society disapproves of, which is making ‘AI’ more ‘safe’, and 2) this is not attacking the core problem, and will not generate inroads to attacking the core problem.
I think the core problem is technical, mathematical, and philosophical. We need to design systems that have lots of complicated properties, and we’re not quite sure what those properties are or whether it’s possible to combine them. The things that people are building look like they’re not going to have those properties. I think there are lots of things that coordination projects have achieved in the past, but none of them look like that.
We can get lots of people to show up at protests and chant slogans in unison. I’m not sure how this solves technical problems.
We can get lots of corporations to have marketing slogans that promote safety. I’m again not sure how this solves technical problems, or how it would be the case that they have ‘real decision-making power’ instead of something more like ‘make it less likely we get sued’.
We can get government regulation of AI, and that means that 1) there will be a race to the bottom across jurisdictions, unless they decide to take it as seriously as they take nuclear proliferation, and 2) the actual result will be companies need large compliance departments in order to develop AI systems, and those compliance departments won’t be able to tell the difference between dangerous and non-dangerous AI.
I endorse this reply.
“I mean, I agree that we’ve failed at our goal. But “haven’t done a very good job” implies to me something like “it was possible to not fail”, which, unclear?”
Of course it was. Was it difficult? Certainly. So difficult that I don’t blame anyone for failing, like I’ve stated in my comment reply to this post.
It’s an extremely difficult problem both technically and politically/socially. The difference is that I don’t see any technical solutions, and have as well heard very convincing arguments by the likes of Roman Yalmpolskiy that such thing might not even exist. But we can all agree that there is at least one political solution—to not build advanced AIs before we’ve solved the alignment problem. No matter how extremely difficult such solution might seem, it actually exists and seems possible.
So we’ve failed, but I’m not blaming anyone because it’s damn difficult. In fact I have nothing but the deepest admiration for the likes of Eliezer, Bostrom and Russell. But my critique still stands: such failure (to get the leaders to care, not the technical failure to solve alignment) COULD be IN PART because most prominent figures like these 3 only talk about AI x-risk and not worse outcomes.
“unless they decide to take it as seriously as they take nuclear proliferation”
That’s precisely what we need. I’d assume that most in this community are quite solidly convinced that “AI is far more dangerous than nukes” (to quote our friend Elon). If leaders could adopt our reasoning, it could be done.
“the actual result will be companies need large compliance departments in order to develop AI systems, and those compliance departments won’t be able to tell the difference between dangerous and non-dangerous AI.”
There are other regulation alternatives. Like restricting access to supercomputers. Or even stoping AI research altogether until we’ve made much more progress on alignment. But your concern is still completely legitimate. But where’s the technical solutions in sight, as an alternative? Should we rather risk dying (again, that’s not even the worse risk) because political solutions seem intractable and only try technical solutions when those seem even way more intractable?
Contingency measures, both technical and political, could also be more effective than both full alignment and political solutions.
From the point of view of most humans, there are few outcomes worse than extinction of humanity (x-risk). Are you implying that most leaders would prefer extinction of humanity to some other likely outcome, and could be persuaded if we focused on that instead?
I strongly suspect there are some that do have such preferences, but I also think that those unpersuaded by the risk of extinction wouldn’t be persuaded by any other argument anyway.
“From the point of view of most humans, there are few outcomes worse than extinction of humanity (x-risk).”
That’s obviously not true. What would you prefer: extinction of humanity, or permanent Holocaust?
“Are you implying that most leaders would prefer extinction of humanity to some other likely outcome, and could be persuaded if we focused on that instead?”
Anyone would prefer extinction to say a permanent Holocaust. Anyone sane at least. But I’m not implying that they would prefer extinction to a positive outcome.
“but I also think that those unpersuaded by the risk of extinction wouldn’t be persuaded by any other argument anyway”
I’ll ask you again: which is worse, extinction or permanent Holocaust?
Note that I didn’t say that there are no outcomes that are worse than extinction. That said, I’m not convinced that permanent Holocaust is worse than permanent extinction, but that’s irrelevant to my point anyway. If someone isn’t convinced by the risk of permanent extinction, are you likely to convince them by the (almost certainly smaller) risk of permanent Holocaust instead?
“That said, I’m not convinced that permanent Holocaust is worse than permanent extinction, but that’s irrelevant to my point anyway.”
Maybe it’s not. What we guess are other people’s values is heavily influenced by our own values. And if you are not convinced that permanent Holocaust is worse than permanent extinction, then, no offense, but you have a very scary value system.
“If someone isn’t convinced by the risk of permanent extinction, are you likely to convince them by the (almost certainly smaller) risk of permanent Holocaust instead?”
Naturally, because the latter is orders of magnitude worse than the former. But again, if you don’t share this view, I can’t see myself convincing you.
And we also have no idea if it really is smaller. But even a small risk of an extremely bad outcome is reason for high alarm.
What exactly do you mean by permanent Holocaust? The way Wikipedia defines the Holocaust it’s about the genocide of Jewish people. Other sources include the genocide of groups like Sinti and Roma as well.
While genocide is very bad, human extinction includes most of the evils of genocide as well, so I would not prefer human extinction.
I think he means the part where people were in ghettos/concentration-camps
Everyone knows that the Holocaust wasn’t just genocide. It was also torture, evil medical experiments, etc. But you’re right, I should have used a better example. Not that I think that anyone really misunderstood what I meant.
“We can get lots of people to show up at protests and chant slogans in unison. I’m not sure how this solves technical problems.”
-In the case that there is someone in the planet who could solve the alignment but still doesn’t know about the problem. If that is the case this can be one of the ways to find him/her/them (we must estimate the best probability of success in the path we take, and if them exist and where). Maybe involving more intellectual resources into the safety technical investigation, with a social mediatic campaign. And if it just accelerates the doom, weren’t we still doomed?
People should and deserve to know the probable future. A surviving timeline can come from a major social revolt, and a social revolt surging from the low probability of alignment is possible.
So we must evaluate prob of success:
1)By creating loudness and how.
2)Keep trying relatively silent.
Again, unclear. Elsewhere on this post, Vanessa Kosoy comments that she thinks there’s a 30% chance of success. The more you accelerate the creation of unsafe AI, the less time Vanessa has to work, and so presumably the lower her chance is; perhaps attracting more talent helps, but past efforts in this direction don’t seem to have obviously been worth it.
Like, we talk a lot about ‘differential tech development’, or developing technology that makes it easier to create an aligned AGI more than it makes it easier to create an unaligned AGI. It would be really nice if “communicate to people about the risk” was this sort of thing—it really seemed like it should be, on first glance—but in retrospect I’m really not sure that it was, and moderately pessimistic about future efforts that aren’t carefully targeted.
I mean, it’s not like we hid the arguments; we posted them on the public internet, I got published in a magazine arguing about it, Eliezer went on the Sam Harris podcast and talked about it, Nick Bostrom wrote a book about it, went on Joe Rogan and talked about it, and talked to the UK government about it. Heck, someone (in an attempt to blackmail MIRI about LessWrong moderation decisions, as far as I can tell) convinced Glenn Beck that AGI was dangerous, mostly in an attempt to discredit it by association.
Like, I would be way more sympathetic to this line of argumentation if it engaged with past efforts and the various balances that everyone was trying to strike, rather than just saying “well, it didn’t work, have you tried more?”
Like, the Slaughterbots campaign was ‘more’, and most AI Alignment people I talk to think it was a bad idea because it probably alienated the military, and made it harder to win in worlds where the first close AGI project is a military project, without clearly delaying the timeline for either AGI or lethal autonomous weapons.
Similarly, I think you have to make a choice between ‘fomenting social revolt’ and ‘working with the technocratic apparatus’, and I haven’t seen much in the way of successful social revolt in developed countries recently. [If the Canadian trucker convoy had been protesting the introduction of autonomous trucks instead of vaccine mandates, would it have been any more successful? What if it was just ‘getting rid of algorithms’, which is what seems to be how simpler versions of these sorts of arguments come out when actually pushed thru the government?
You have to admit that getting rid of algorithms would be a great help though. We might still be allowed to do some on paper, but that surely couldn’t be too dangerous?