One is the difference between training time and deployment, as others have mentioned. But the other is that I’m skeptical that there will be a singleton AI that was just trained via reinforcement learning.
Like, we’re going to train a single neural network end-to-end on running the world? And just hand over the economy to it? I don’t think that’s how it’s going to go. There will be interlocking more-and-more powerful systems. See: Arguments about fast takeoff.
What do you think the difference would be between an AGI’s reward function, and that of GPT-2 during the error it experienced?
One is the difference between training time and deployment, as others have mentioned. But the other is that I’m skeptical that there will be a singleton AI that was just trained via reinforcement learning.
Like, we’re going to train a single neural network end-to-end on running the world? And just hand over the economy to it? I don’t think that’s how it’s going to go. There will be interlocking more-and-more powerful systems. See: Arguments about fast takeoff.