So, if I were to point out what seems like the most dubious premise of the argument, it would be “An AGI that has at least one mistake in its alignment model will be unaligned.”
Deep Learning is notorious for continuing to work while there are mistakes in it; you can accidentally leave all kinds of things out and it still works just fine. There are of course arguments that value is fragile and that if we get it 0.1% wrong then we lose 99.9% of all value in the universe, just as there are arguments that the aforementioned arguments are quite wrong. But “one mistake” = “failure” is not a firm principle in other areas of engineering, so it’s unlikely to be the case.
But, apart from whether individual premises are true or false—in general, it might help if, instead of asking yourself, “What kind of arguments can I make for or against AI doom?” you instead asked “hey, what would AI doom predict about the world, and are those predictions coming true?”
It’s a feature of human cognition that we can make what feel like good arguments for anything. The history of human thought before the scientific method is people making lots of good arguments for or against God’s existence, for or against the divine right of kings, and so on, and never finding anything new out about the world. But the world changed when some people gave up on that project, and realized they needed to look for predictions.. This changed the world; I really, really recommend the prior article.
Deep Learning is notorious for continuing to work while there are mistakes in it; you can accidentally leave all kinds of things out and it still works just fine. There are of course arguments that value is fragile and that if we get it 0.1% wrong then we lose 99.9% of all value in the universe, just as there are arguments that the aforementioned arguments are quite wrong. But “one mistake” = “failure” is not a firm principle in other areas of engineering, so it’s unlikely to be the case.
Ok the “An AGI that has at least one mistake in its alignment model will be unaligned” premise seems like the weakest one. Is there any agreement in the AI community about how much alignment is “enough”? I suppose it depends on the AI capabilities and how long you want to have it running for. Are there any estimates?
For example if humanity wanted a 50% chance of surviving another 10 years in the presence of a meta-stably-aligned ASI, the ASI would need a daily non-failure rate of x^(365.25*10)=0.5, x=0.99981 or better.
It’s a feature of human cognition that we can make what feel like good arguments for anything.
I would tend to agree, the confirmation bias is profound in humans.
you instead asked “hey, what would AI doom predict about the world, and are those predictions coming true?”
Are suggesting a process such as: assume a future → predict what would be required for this → compare prediction with reality.
Rather than: observe reality → draw conclusions → predict the future
So, if I were to point out what seems like the most dubious premise of the argument, it would be “An AGI that has at least one mistake in its alignment model will be unaligned.”
Deep Learning is notorious for continuing to work while there are mistakes in it; you can accidentally leave all kinds of things out and it still works just fine. There are of course arguments that value is fragile and that if we get it 0.1% wrong then we lose 99.9% of all value in the universe, just as there are arguments that the aforementioned arguments are quite wrong. But “one mistake” = “failure” is not a firm principle in other areas of engineering, so it’s unlikely to be the case.
But, apart from whether individual premises are true or false—in general, it might help if, instead of asking yourself, “What kind of arguments can I make for or against AI doom?” you instead asked “hey, what would AI doom predict about the world, and are those predictions coming true?”
It’s a feature of human cognition that we can make what feel like good arguments for anything. The history of human thought before the scientific method is people making lots of good arguments for or against God’s existence, for or against the divine right of kings, and so on, and never finding anything new out about the world. But the world changed when some people gave up on that project, and realized they needed to look for predictions.. This changed the world; I really, really recommend the prior article.
Nullius in verba is good motto.
Thank you for your reply 1a3orn, I will have a read over some of the links you posted.
Ok the “An AGI that has at least one mistake in its alignment model will be unaligned” premise seems like the weakest one. Is there any agreement in the AI community about how much alignment is “enough”? I suppose it depends on the AI capabilities and how long you want to have it running for. Are there any estimates?
For example if humanity wanted a 50% chance of surviving another 10 years in the presence of a meta-stably-aligned ASI, the ASI would need a daily non-failure rate of x^(365.25*10)=0.5, x=0.99981 or better.
I would tend to agree, the confirmation bias is profound in humans.
Are suggesting a process such as: assume a future → predict what would be required for this → compare prediction with reality.
Rather than: observe reality → draw conclusions → predict the future