I’m confused. I already addressed the possibility of modeling the external world. Did you think the paragraph below was about something else, or did it just not convince you? (If the latter, that’s entirely fine, but I think it’s good to note that you understand my argument without finding it persuasive. Conversational niceties like this help both participants understand each other.)
An AI might model a location that happens to be its environment, including its own self. But if this model is not connected in the right way to its consequentialism, it still won’t take over the world. It has to generate actions within its environment to do that, and language models simply don’t work that way.
Or to put it another way, it understands how the external world works, but not that it’s part of the external world. It doesn’t self-model in that way. It might even have a model of itself, but it won’t understand that the model is recursive. Its value function doesn’t assign a high value to words that its model says will result in its hardware being upgraded, because the model and the goals aren’t connected in that way.
T-shirt slogan: “It might understand the world, but it doesn’t understand that it understands the world.”
You might say “this sort of AI won’t be powerful enough to answer complicated technical questions correctly.” If so, that’s probably our crux. I have a reference class of Deep Blue and AIXI, both of which answer questions at a superhuman level without understanding self-modification, but the former doesn’t actually model the world and AIXI doesn’t belong in discussions of practical feasibility. So I’ll just point at the crux and hope you have something to say about it.
You might say, as Yudkowsky has before, “this design is too vague and you can attribute any property to it that you like; come back when you have a technical description”. If so, I’ll admit I’m just a novice speculating on things they don’t understand well. If you want a technical description then you probably don’t want to talk to me; someone at OpenAI would probably be much better at describing how language models work and what their limitations are, but honestly anyone who’s done AI work or research would be better at this than me. Or you can wait a decade and then I’ll be in the class of “people who’ve done AI work or research”.
I’m confused. I already addressed the possibility of modeling the external world. Did you think the paragraph below was about something else, or did it just not convince you? (If the latter, that’s entirely fine, but I think it’s good to note that you understand my argument without finding it persuasive. Conversational niceties like this help both participants understand each other.)
Or to put it another way, it understands how the external world works, but not that it’s part of the external world. It doesn’t self-model in that way. It might even have a model of itself, but it won’t understand that the model is recursive. Its value function doesn’t assign a high value to words that its model says will result in its hardware being upgraded, because the model and the goals aren’t connected in that way.
T-shirt slogan: “It might understand the world, but it doesn’t understand that it understands the world.”
You might say “this sort of AI won’t be powerful enough to answer complicated technical questions correctly.” If so, that’s probably our crux. I have a reference class of Deep Blue and AIXI, both of which answer questions at a superhuman level without understanding self-modification, but the former doesn’t actually model the world and AIXI doesn’t belong in discussions of practical feasibility. So I’ll just point at the crux and hope you have something to say about it.
You might say, as Yudkowsky has before, “this design is too vague and you can attribute any property to it that you like; come back when you have a technical description”. If so, I’ll admit I’m just a novice speculating on things they don’t understand well. If you want a technical description then you probably don’t want to talk to me; someone at OpenAI would probably be much better at describing how language models work and what their limitations are, but honestly anyone who’s done AI work or research would be better at this than me. Or you can wait a decade and then I’ll be in the class of “people who’ve done AI work or research”.