The model I had in mind was that the AI and the toy world are both abstract computational processes with no causal influence from our world, and that we are merely simulating/spectating on both the AI itself and the toy world it optimizes. If the AI messes with people simulating it so that they end up simulating a similar AI with more compute, this can give it more influence over these peoples’ simulation of the toy world the AI is optimizing, but it doesn’t give the AI any more influence over the abstract computational process that it (another abstract computational process) was interfacing with and optimizing over.
Separately, I also find it hard to imagine us building a virtual world that is similar enough to the real world that we are able to transfer solutions between the two, even with some finetuning in the real world.
Yes, this could be difficult, and would likely limit what we could do, but I don’t see why this would prevent us from getting anything useful out of a virtual-world-optimizer. Lots of engineering tasks don’t require more explicit physics knowledge than we already have.
This model seems very fatalistic, I guess? It seems somewhat incompatible with an agent that has preferences. (Perhaps you’re suggesting we build an AI without preferences, but it doesn’t sound like that.)
Yes, this could be difficult, and would likely limit what we could do, but I don’t see why this would prevent us from getting anything useful out of a virtual-world-optimizer. Lots of engineering tasks don’t require more explicit physics knowledge than we already have.
I think there’s a lot of common sense that humans apply that allows them to design solutions that meet many implicit constraints that they can’t easily verbalize. “Thinking outside of the box” is when a human manages to design something that doesn’t satisfy one of the constraints, because it turns out that constraint wasn’t useful. But in most cases, those constraints are very useful, because they make the search space much smaller. By default, these constraints won’t carry over into the virtual world.
This model seems very fatalistic, I guess? It seems somewhat incompatible with an agent that has preferences. (Perhaps you’re suggesting we build an AI without preferences, but it doesn’t sound like that.)
Ok, here’s another attempt to explain what I meant. Somewhere in the platonic realm of abstract mathematical structures, there is a small world with physics quite a lot like ours, containing an AI running on some idealized computational hardware, and trying to arrange the rest of the small world so that it has some desired property. Humans simulate this process so they can see what the AI does in the small world, and copy what it does. The AI could try messing with us spectators, so that we end up giving more compute to the physical instantiation of the AI in the human world (which is different from the AI in the platonic mathematical structure), which the physical instantiation of the AI in the human world can use to better manipulate the simulation of the toy world that we are running in the human world (which is also different from the platonic mathematical structure). The platonic mathematical structure itself does not have a human world with extra compute in it that can be grabbed, so trying to mess with human spectators would, in the platonic mathematical structure, just end up being a waste of compute, so this strategy will be discarded if it somehow gets considered in the first place. Thus a real-world simulation of this AI-in-a-platonic-mathematical-structure will, if accurate, behave in the same way.
The model I had in mind was that the AI and the toy world are both abstract computational processes with no causal influence from our world, and that we are merely simulating/spectating on both the AI itself and the toy world it optimizes. If the AI messes with people simulating it so that they end up simulating a similar AI with more compute, this can give it more influence over these peoples’ simulation of the toy world the AI is optimizing, but it doesn’t give the AI any more influence over the abstract computational process that it (another abstract computational process) was interfacing with and optimizing over.
Yes, this could be difficult, and would likely limit what we could do, but I don’t see why this would prevent us from getting anything useful out of a virtual-world-optimizer. Lots of engineering tasks don’t require more explicit physics knowledge than we already have.
This model seems very fatalistic, I guess? It seems somewhat incompatible with an agent that has preferences. (Perhaps you’re suggesting we build an AI without preferences, but it doesn’t sound like that.)
I think there’s a lot of common sense that humans apply that allows them to design solutions that meet many implicit constraints that they can’t easily verbalize. “Thinking outside of the box” is when a human manages to design something that doesn’t satisfy one of the constraints, because it turns out that constraint wasn’t useful. But in most cases, those constraints are very useful, because they make the search space much smaller. By default, these constraints won’t carry over into the virtual world.
(Lots of examples of this in The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities)
Ok, here’s another attempt to explain what I meant. Somewhere in the platonic realm of abstract mathematical structures, there is a small world with physics quite a lot like ours, containing an AI running on some idealized computational hardware, and trying to arrange the rest of the small world so that it has some desired property. Humans simulate this process so they can see what the AI does in the small world, and copy what it does. The AI could try messing with us spectators, so that we end up giving more compute to the physical instantiation of the AI in the human world (which is different from the AI in the platonic mathematical structure), which the physical instantiation of the AI in the human world can use to better manipulate the simulation of the toy world that we are running in the human world (which is also different from the platonic mathematical structure). The platonic mathematical structure itself does not have a human world with extra compute in it that can be grabbed, so trying to mess with human spectators would, in the platonic mathematical structure, just end up being a waste of compute, so this strategy will be discarded if it somehow gets considered in the first place. Thus a real-world simulation of this AI-in-a-platonic-mathematical-structure will, if accurate, behave in the same way.
Ah, I see. That does make it seem clearer to me, though I’m not sure what beliefs actually changed.