The capabilities people have thrown a challenge right back at you!
Convincing All Alignment Researchers
Give the world’s hundred most respected AI alignment researchers $1M each to spend 3 months explaining why AI will be misaligned, with an extra $100M if by the end they can propose an argument capability researchers can’t shoot down. They probably won’t make any progress, but from then on when others ask them whether they think alignment is a real unsolved problem, they will be way more likely to say no. That only costs you a hundred million dollars!
I don’t think I could complete this challenge, yet I also predict that I would not then say that alignment is not a real unsolved challenge. I mostly expect the same problem from the OP proposal, for pretty similar reasons.
For the record I think this would also be valuable! If as an alignment researcher your arguments don’t survive the scrutiny of skeptics, you should probably update away from them. I think maybe what you’re highlighting here is the operationalization of “shoot down”, which I wholeheartedly agree is the actual problem.
Re: the quantities of funding, I know you’re being facetious, but just to point it out, the economic value of of “capabilities researchers being accidentally too optimistic about alignment” and “alignment researchers being too pessimistic about alignment” are asymmetric.
If as an alignment researcher your arguments don’t survive the scrutiny of skeptics, you should probably update away from them.
If that’s your actual belief you should probably update away now. People have tried to do this for years, and in fact in most cases the skeptics were not convinced.
Personally, I’m much more willing to say that they’re wrong and so I don’t update very much.
Re: the quantities of funding, I know you’re being facetious, but just to point it out, the economic value of of “capabilities researchers being accidentally too optimistic about alignment” and “alignment researchers being too pessimistic about alignment” are asymmetric.
Yeah, that was indeed just humor, and I agree with the point.
Hmmm, I wonder if there’s a version like this that would actually be decent at ‘getting people to grapple with’ the ideas? Like, if you got Alignment Skeptic Alice to judge the ELK prize submissions with Paul Christiano and Mark Xu (presumably by paying her a bunch of money), would Alice come away from the experience thinking “actually ELK is pretty challenging” or would she come away from it thinking “Christiano and Xu are weirdly worried about edge cases that would never happen in reality”?
would Alice come away from the experience thinking “actually ELK is pretty challenging” or would she come away from it thinking “Christiano and Xu are weirdly worried about edge cases that would never happen in reality”?
If Alice is a skeptic because she actually has thought about it for a while and come to an opinion, it will totally be the latter.
If Alice has just not thought about it much, and she believes that AGI is coming, then she might believe the former. But in that case I think you could just have Alice talk to e.g. me for, say, 10 hours, and I’d have a pretty decent chance of convincing her that we should have more investment in AGI alignment.
(Or to put it another way: just actually honestly debating with the other person seems a lot more likely to work than immersing them in difficult alignment research—at least as long as you know how to debate with non-rationalists.)
Of course, but “judge the ELK prize submissions with Paul Christiano and Mark Xu” is also not a scalable approach, since it requires a lot of conversation with Paul and Mark?
God, absolutely, yes, do I get to talk to the sceptic in question regularly over the three months?
Given three months of dialogue with someone who thinks like me about computers and maths, and where we both promise to take each other’s ideas seriously, if I haven’t changed his mind far enough to convince him that there are serious reasons to be scared, he will have changed mine.
I have actually managed this with a couple of sceptic friends, although the three months of dialogue has been spread out over the last decade!
And I don’t know what I’m talking about. Are you seriously saying that our best people can’t do this?! Eliezer used to make a sport of getting people to let him out of his box. And has always been really really good at explaining complicated thoughts persuasively.
Maybe our arguments aren’t worth listening to. Maybe we’re just wrong.
Give me this challenge!! Nobody needs to pay me, I will try to do this for fun and curiosity with anyone on the other side who is open-minded enough to commit to regular chats. An hour every evening?
In person would be better, so Cambridge or maybe London? I can face the afternoon train for this.
I think it’s totally doable (and I have done it myself) to convince people who haven’t yet staked a claim as an Alignment Skeptic. There are specific people such as Yann Lecun who are publicly skeptical of alignment research; it is them who I imagine we could not convince.
OK, I myself am sceptical of alignment research, but not at all sceptical of the necessity for it.
Do you think someone like Eliezer has had a proper go at convincing him that there’s a problem? Or will he just not give us the time of day? Has he written anything coherent on the internet that I could read in order to see what his objections are?
Personally I would love to lose my doom-related beliefs, so I’d like to try to understand his position as well as I can for two reasons.
Great, thanks, so I’m going to write down my response to his thoughts as I hear them:
Before reading the debate I read the Scientific American article it’s about. On first read, that seems convincing, ok, relax! And then take a closer look.
What’s he saying (paraphrasing stuff from Scientific American):
Superintelligence is possible, can be made to act in the world, and is likely coming soon.
Intelligence and goals are decoupled
Why would a sentient AI want to take over the world? It wouldn’t.
Intelligence per se does not generate the drive for domination
Not all animals care about dominance.
I’m a bit worried by mention of the first law of robotics. I thought the point of all those stories was all the ways such laws might lead to weird outcomes.
Blah blah joblessness, military robots, blah inequality, all true, I might even care if I thought there was going to be anyone around to worry about it. But it does mean that he’s not a starry-eyed optimist who thinks nothing can go wrong.
That’s great! I agree with all of that, it’s often very hard to get people that far. I think he’s on board with most of our argument.
And then right at the end (direct quote):
Even in the worst case, the robots will remain under our command, and we will have only ourselves to blame.
OK, so he thinks that because you made a robot, it will stay loyal to you and follow your commands. No justification given.
It’s not really fair to close read an article in a popular magazine, but at this point I think he maybe he realises that you can make a superintelligent wish-granting machine, but hasn’t thought about what happens if you make the wrong wish and want to change it later.
(I’m supposed to not be throwing mythological and literary references into things any more, but I can’t help but think about the Sybil, rotting in her bag because Apollo had granted her eternal life but not eternal youth, or TS Eliot’s: “That is not it at all, That is not what I meant, at all.” )
So let’s go and look at the debate itself, rather than the article.
I was talking to someone recently who talked to Yann and got him to agree with very alignment-y things, but then a couple days later, Yann was saying very capabilities things instead.
The “someone”’s theory was that Yann’s incentives and environment is all towards capabilities research.
The capabilities people have thrown a challenge right back at you!
Convincing All Alignment Researchers
I don’t think I could complete this challenge, yet I also predict that I would not then say that alignment is not a real unsolved challenge. I mostly expect the same problem from the OP proposal, for pretty similar reasons.
For the record I think this would also be valuable! If as an alignment researcher your arguments don’t survive the scrutiny of skeptics, you should probably update away from them. I think maybe what you’re highlighting here is the operationalization of “shoot down”, which I wholeheartedly agree is the actual problem.
Re: the quantities of funding, I know you’re being facetious, but just to point it out, the economic value of of “capabilities researchers being accidentally too optimistic about alignment” and “alignment researchers being too pessimistic about alignment” are asymmetric.
If that’s your actual belief you should probably update away now. People have tried to do this for years, and in fact in most cases the skeptics were not convinced.
Personally, I’m much more willing to say that they’re wrong and so I don’t update very much.
Yeah, that was indeed just humor, and I agree with the point.
I was referring to your original statements:
I think you might be construing my statement
as that you should take AI risk less seriously if you can’t convince the skeptics, as opposed to if the skeptics can’t convince you.
Ah, I see, that makes more sense, sorry for the misunderstanding.
(Fwiw I and others have in fact talked with skeptics about alignment.)
Hmmm, I wonder if there’s a version like this that would actually be decent at ‘getting people to grapple with’ the ideas? Like, if you got Alignment Skeptic Alice to judge the ELK prize submissions with Paul Christiano and Mark Xu (presumably by paying her a bunch of money), would Alice come away from the experience thinking “actually ELK is pretty challenging” or would she come away from it thinking “Christiano and Xu are weirdly worried about edge cases that would never happen in reality”?
If Alice is a skeptic because she actually has thought about it for a while and come to an opinion, it will totally be the latter.
If Alice has just not thought about it much, and she believes that AGI is coming, then she might believe the former. But in that case I think you could just have Alice talk to e.g. me for, say, 10 hours, and I’d have a pretty decent chance of convincing her that we should have more investment in AGI alignment.
(Or to put it another way: just actually honestly debating with the other person seems a lot more likely to work than immersing them in difficult alignment research—at least as long as you know how to debate with non-rationalists.)
Your time isn’t scalable, though—there are well over 10,000 replacement-level AI researchers.
Of course, but “judge the ELK prize submissions with Paul Christiano and Mark Xu” is also not a scalable approach, since it requires a lot of conversation with Paul and Mark?
I think we agree that scaling is relative and more is better!
God, absolutely, yes, do I get to talk to the sceptic in question regularly over the three months?
Given three months of dialogue with someone who thinks like me about computers and maths, and where we both promise to take each other’s ideas seriously, if I haven’t changed his mind far enough to convince him that there are serious reasons to be scared, he will have changed mine.
I have actually managed this with a couple of sceptic friends, although the three months of dialogue has been spread out over the last decade!
And I don’t know what I’m talking about. Are you seriously saying that our best people can’t do this?! Eliezer used to make a sport of getting people to let him out of his box. And has always been really really good at explaining complicated thoughts persuasively.
Maybe our arguments aren’t worth listening to. Maybe we’re just wrong.
Give me this challenge!! Nobody needs to pay me, I will try to do this for fun and curiosity with anyone on the other side who is open-minded enough to commit to regular chats. An hour every evening?
In person would be better, so Cambridge or maybe London? I can face the afternoon train for this.
I think it’s totally doable (and I have done it myself) to convince people who haven’t yet staked a claim as an Alignment Skeptic. There are specific people such as Yann Lecun who are publicly skeptical of alignment research; it is them who I imagine we could not convince.
OK, I myself am sceptical of alignment research, but not at all sceptical of the necessity for it.
Do you think someone like Eliezer has had a proper go at convincing him that there’s a problem? Or will he just not give us the time of day? Has he written anything coherent on the internet that I could read in order to see what his objections are?
Personally I would love to lose my doom-related beliefs, so I’d like to try to understand his position as well as I can for two reasons.
Here’s an example
Great, thanks, so I’m going to write down my response to his thoughts as I hear them:
Before reading the debate I read the Scientific American article it’s about. On first read, that seems convincing, ok, relax! And then take a closer look.
What’s he saying (paraphrasing stuff from Scientific American):
Superintelligence is possible, can be made to act in the world, and is likely coming soon.
Intelligence and goals are decoupled
Why would a sentient AI want to take over the world? It wouldn’t.
Intelligence per se does not generate the drive for domination
Not all animals care about dominance.
I’m a bit worried by mention of the first law of robotics. I thought the point of all those stories was all the ways such laws might lead to weird outcomes.
Blah blah joblessness, military robots, blah inequality, all true, I might even care if I thought there was going to be anyone around to worry about it. But it does mean that he’s not a starry-eyed optimist who thinks nothing can go wrong.
That’s great! I agree with all of that, it’s often very hard to get people that far. I think he’s on board with most of our argument.
And then right at the end (direct quote):
OK, so he thinks that because you made a robot, it will stay loyal to you and follow your commands. No justification given.
It’s not really fair to close read an article in a popular magazine, but at this point I think he maybe he realises that you can make a superintelligent wish-granting machine, but hasn’t thought about what happens if you make the wrong wish and want to change it later.
(I’m supposed to not be throwing mythological and literary references into things any more, but I can’t help but think about the Sybil, rotting in her bag because Apollo had granted her eternal life but not eternal youth, or TS Eliot’s: “That is not it at all, That is not what I meant, at all.” )
So let’s go and look at the debate itself, rather than the article.
I was talking to someone recently who talked to Yann and got him to agree with very alignment-y things, but then a couple days later, Yann was saying very capabilities things instead.
The “someone”’s theory was that Yann’s incentives and environment is all towards capabilities research.