We need more discussion of the cognitive biases affecting the alignment debate. The eye needs to see its own flaws in order to make progress fast enough to survive AGI.
It’s curious that I disagree with you on many particulars, but agree with you on the main important points.
Outside-view is overrated
Agreed on the main logic: it’s tough to guess how many opinions are just duplicates.
Safer? Maybe, but we also overvalue “having your own opinion” even if it’s not informed and well-thought-out.
Arguments about P(doom) are filtered for nonhazardousness
Agreed, and it’s important. But I think the cause is mostly different: most people with a low estimate are succumbing to motivated reasoning. They very much want to believe they and everything they love won’t die. So any alternative feels better, and they wind up believing it.
Confusion about the problem often leads to useless research
Very much agreed. This is also true of every other scientific and engineering field I know about. People like doing research more than they like figuring out what research is important. To your specific points:
What are human values?
Doesn’t matter. The only human values that matter are the ones of the people in charge of the first ASI.
Aligned to whom?
I think this matters an awful lot, because there are sadists and sociopaths in positions of power. As long as the team in charge of ASI has a positive empathy—sadism balance, we’ll like their utopia just fine, at least after they get time to think it through.
What does it mean for something to be an optimizer?
Optimization doesn’t matter. What matters is pursuing goals more competently than humans can.
Okay, unaligned ASI would kill everyone, but how?
Agreed that it doesn’t matter. You can keep a smarter thing contained or controlled for a while, but eventually it will outsmart you and do whatever it wants.
What about multipolar scenarios?
Disagree. Value handshakes aren’t practically possible. They go to war and you die as collatoral damage.
More importantly, it doesn’t matter which of those is right. Multipolar scenarios are worse not better.
What counts as AGI, and when do we achieve that?
What counts is the thing that can outsmart you. When we achieve it does matter, but only a little for time to plan and prepare. How we achieve that matters a lot, because technical alignment plans are specific to AGI designs.
If the AGI winds up just wanting a lot of paperclips, and also wanting a lot of stamps, it may not have a way to decide exactly how much of each to go for, or when “a lot” has been achieved. But there’s a common obstacle to those goals: humans saying “hell no” and shutting it down when it starts setting up massive paperclip factories. Therefore it has a new subgoal: prevent humans from interfering with it. That probably involves taking over the world or destroying humanity.
If the goal is strictly bounded at some millions of tons of paperclips and stamps, then negotiating with humanity might make more sense. But if the vague goal is large, it implies all of the usual dangers to humanity because they’re incompatible with how we want to use the earth and whether we want an ASI with strange goals making paperclips on the moon.
Excellent post, big upvote.
We need more discussion of the cognitive biases affecting the alignment debate. The eye needs to see its own flaws in order to make progress fast enough to survive AGI.
It’s curious that I disagree with you on many particulars, but agree with you on the main important points.
Agreed on the main logic: it’s tough to guess how many opinions are just duplicates. Safer? Maybe, but we also overvalue “having your own opinion” even if it’s not informed and well-thought-out.
Agreed, and it’s important. But I think the cause is mostly different: most people with a low estimate are succumbing to motivated reasoning. They very much want to believe they and everything they love won’t die. So any alternative feels better, and they wind up believing it.
Very much agreed. This is also true of every other scientific and engineering field I know about. People like doing research more than they like figuring out what research is important. To your specific points:
Doesn’t matter. The only human values that matter are the ones of the people in charge of the first ASI.
I think this matters an awful lot, because there are sadists and sociopaths in positions of power. As long as the team in charge of ASI has a positive empathy—sadism balance, we’ll like their utopia just fine, at least after they get time to think it through.
Optimization doesn’t matter. What matters is pursuing goals more competently than humans can.
Agreed that it doesn’t matter. You can keep a smarter thing contained or controlled for a while, but eventually it will outsmart you and do whatever it wants.
Disagree. Value handshakes aren’t practically possible. They go to war and you die as collatoral damage. More importantly, it doesn’t matter which of those is right. Multipolar scenarios are worse not better.
What counts is the thing that can outsmart you. When we achieve it does matter, but only a little for time to plan and prepare. How we achieve that matters a lot, because technical alignment plans are specific to AGI designs.
Thanks for the post on an important topic!
Can you say more about what causes you to believe this?
If the AGI winds up just wanting a lot of paperclips, and also wanting a lot of stamps, it may not have a way to decide exactly how much of each to go for, or when “a lot” has been achieved. But there’s a common obstacle to those goals: humans saying “hell no” and shutting it down when it starts setting up massive paperclip factories. Therefore it has a new subgoal: prevent humans from interfering with it. That probably involves taking over the world or destroying humanity.
If the goal is strictly bounded at some millions of tons of paperclips and stamps, then negotiating with humanity might make more sense. But if the vague goal is large, it implies all of the usual dangers to humanity because they’re incompatible with how we want to use the earth and whether we want an ASI with strange goals making paperclips on the moon.