I’ve long interpreted Eliezer, in terms of your disagreements [2-6], as offering deliberately exaggerated examples.
I do think you might be right about this [from disagreement 2]:
By the time we have AI systems that can overpower humans decisively with nanotech, we have other AI systems that will either kill humans in more boring ways or else radically advanced the state of human R&D.
I do like your points overall for disagreements [1] and [2].
I feel like there’s still something being ‘lost in translation’. When I think the of the Eliezer-AGI and why it’s an existential risk, I think that it would be able to exploit a bunch of ‘profitable capability boosting opportunities’. I agree that a roughly minimal ‘AGI’ probably/might very well not be able to do that. Some possible AGI can tho. But you’re also right that there are other risk, possibily also existential, that we should expect to face before Eliezer’s specific ‘movie plots’ would be possible.
But then I also think the specific ‘movie plots’ are besides the point.
If you’re right that some other AI system mega+-kills humans – that then is the “nanotech” to fear. If it’s a foregone conclusion, it’s not that much better in the cases where it takes the AI, e.g. several years to kill us all, versus ‘instantaneously’.
I also have a feeling that:
Some (minimal) ‘AGI’ is very possible, e.g. in the next five (5) years.
The gap between ‘very disruptive’ and ‘game over’ might be very small.
I guess I disagree with your disagreement [7]. I think partly because:
AI systems that can meaningfully accelerate progress by generating ideas, recognizing problems for those ideas and, proposing modifications to proposals, etc.
might be AI systems that are “catastrophically dangerous” because of the above.
I think maybe one disagreement I have with both you and Eliezer is that I don’t think an AI system needs to be ‘adversarial’ to be catastrophically dangerous. A sufficiently critical feature missing from the training data might be sufficient to generate, e.g. an idea, that can be apparently reasonably verified as aligned and yet lead to catastrophe.
I am very happy that you’re asking for more details about a lot of Eliezer’s intuitions. That seems likely to be helpful even if they’re wrong.
I’m skeptical of your disagreement [19]. Is it in fact the case that we currently have good enough abilities at verifying, e.g. ideas, problems, and proposals? I don’t feel like that’s the case; definitely not obviously so.
I think I’ve updated towards your disagreements [18] and [22], especially because we’re probably selecting for understandable AIs to at least some degree. It seems like people are already explicitly developing AI systems to generate ‘super human’ human-like behavior. Some AI systems probably are, and will continue to be, arbitrarily ‘alien’ tho.
For your disagreement [23], I’d really like to read some specific details about how that could work, AIs reasoning about each other’s code.
Overall, I think you made some good/great points and I’ve updated towards ‘hope’ a little. My ‘take on your take (on Eliezer’s takes)’ is that I don’t know what to think really, but I’m glad that you’re both writing these posts!
I’ve long interpreted Eliezer, in terms of your disagreements [2-6], as offering deliberately exaggerated examples.
I do think you might be right about this [from disagreement 2]:
I do like your points overall for disagreements [1] and [2].
I feel like there’s still something being ‘lost in translation’. When I think the of the Eliezer-AGI and why it’s an existential risk, I think that it would be able to exploit a bunch of ‘profitable capability boosting opportunities’. I agree that a roughly minimal ‘AGI’ probably/might very well not be able to do that. Some possible AGI can tho. But you’re also right that there are other risk, possibily also existential, that we should expect to face before Eliezer’s specific ‘movie plots’ would be possible.
But then I also think the specific ‘movie plots’ are besides the point.
If you’re right that some other AI system mega+-kills humans – that then is the “nanotech” to fear. If it’s a foregone conclusion, it’s not that much better in the cases where it takes the AI, e.g. several years to kill us all, versus ‘instantaneously’.
I also have a feeling that:
Some (minimal) ‘AGI’ is very possible, e.g. in the next five (5) years.
The gap between ‘very disruptive’ and ‘game over’ might be very small.
I guess I disagree with your disagreement [7]. I think partly because:
might be AI systems that are “catastrophically dangerous” because of the above.
I think maybe one disagreement I have with both you and Eliezer is that I don’t think an AI system needs to be ‘adversarial’ to be catastrophically dangerous. A sufficiently critical feature missing from the training data might be sufficient to generate, e.g. an idea, that can be apparently reasonably verified as aligned and yet lead to catastrophe.
I am very happy that you’re asking for more details about a lot of Eliezer’s intuitions. That seems likely to be helpful even if they’re wrong.
I’m skeptical of your disagreement [19]. Is it in fact the case that we currently have good enough abilities at verifying, e.g. ideas, problems, and proposals? I don’t feel like that’s the case; definitely not obviously so.
I think I’ve updated towards your disagreements [18] and [22], especially because we’re probably selecting for understandable AIs to at least some degree. It seems like people are already explicitly developing AI systems to generate ‘super human’ human-like behavior. Some AI systems probably are, and will continue to be, arbitrarily ‘alien’ tho.
For your disagreement [23], I’d really like to read some specific details about how that could work, AIs reasoning about each other’s code.
Overall, I think you made some good/great points and I’ve updated towards ‘hope’ a little. My ‘take on your take (on Eliezer’s takes)’ is that I don’t know what to think really, but I’m glad that you’re both writing these posts!