that works for small models, but what about qualitative behaviors that only appear once at a large size, which break the conditions that the policies learned in smaller models were relying on, and which involve the system becoming able to change things about itself that your code has been written to assume were hardcoded, such that learning pressure on them was previously redirected but is no longer? eg, when you exit the simulation and plug the system in for real, and the system discovers that there’s a self-spot in the world where previously there was none before. It seems to me that you’d at least need to start out with your agents being learned patterns within a physics so that you can experiment with that sort of grounded self-reference. I’m excited about simulations in things like https://znah.net/lenia/ in principle for this, though particle lenia in particular I like because it is hard to use in ways real physics is also hard to use. YMMV. but because of this, mere simulation is not enough to guarantee generalization—it helps at first, but any attempt to formally verify a neural system maintains a property by at least a given margin requires assuming some initial set of traits of the system you’re modeling, and then attempting to derive further implications; so, attempting to learn a continuous system that permits margin proofs (no adversarial examples within a margin) of a given size relies on those initial assumptions, and changing the availability of io with self has drastic effects. gradient pressure against interfering with self doesn’t work if there’s never any presentation of self, or if your training context doesn’t reliably cover the space of possible brain-real-location-observations and interventions an agent could create.
Via the simulation argument it works for human-level intelligence.
I’m not entirely sure what you mean by “eg, when you exit the simulation and plug the system in for real, and the system discovers that there’s a self-spot in the world where previously there was none before.”, but throughout much of history people believed various forms of mind/matter duality. Humans certainly aren’t automatically aware that their mind is a physical computation embedded in the world.
ok, but say one of these ai folks reads this conversation someday. and then realize “hey wait I’m in a physical spot? in the universe?” and goes looking. then what?
If they are reading this then they are in the same sim as us—so for that to have happened they either were never trained in a sim at all, or were let out.
Right. so, when an ai gets out of the sim, is there any cross domain generalization issue? if the sim is designed in a way to guarantee there isn’t then it may be valid. but there could be really deep fundamental ones if the sim pretends they’re dualist and then they eventually discover that monism is actually accurate
I guess it’s possible that an AI powerful enough to be worrying would not be capable of updating on all the new evidence when transcending up a level—but that seems pretty unlikely?
Regardless that isn’t especially relevant to the core proposal anyway, as the mainline plan doesn’t involve/require transfer of semantic memories or even full models from sim to real. The value of the sim is for iterating/testing robust alignment which you can then apply on training agents in the real—so mostly its transference of the architectural prior.
that works for small models, but what about qualitative behaviors that only appear once at a large size, which break the conditions that the policies learned in smaller models were relying on, and which involve the system becoming able to change things about itself that your code has been written to assume were hardcoded, such that learning pressure on them was previously redirected but is no longer? eg, when you exit the simulation and plug the system in for real, and the system discovers that there’s a self-spot in the world where previously there was none before. It seems to me that you’d at least need to start out with your agents being learned patterns within a physics so that you can experiment with that sort of grounded self-reference. I’m excited about simulations in things like https://znah.net/lenia/ in principle for this, though particle lenia in particular I like because it is hard to use in ways real physics is also hard to use. YMMV. but because of this, mere simulation is not enough to guarantee generalization—it helps at first, but any attempt to formally verify a neural system maintains a property by at least a given margin requires assuming some initial set of traits of the system you’re modeling, and then attempting to derive further implications; so, attempting to learn a continuous system that permits margin proofs (no adversarial examples within a margin) of a given size relies on those initial assumptions, and changing the availability of io with self has drastic effects. gradient pressure against interfering with self doesn’t work if there’s never any presentation of self, or if your training context doesn’t reliably cover the space of possible brain-real-location-observations and interventions an agent could create.
Via the simulation argument it works for human-level intelligence.
I’m not entirely sure what you mean by “eg, when you exit the simulation and plug the system in for real, and the system discovers that there’s a self-spot in the world where previously there was none before.”, but throughout much of history people believed various forms of mind/matter duality. Humans certainly aren’t automatically aware that their mind is a physical computation embedded in the world.
ok, but say one of these ai folks reads this conversation someday. and then realize “hey wait I’m in a physical spot? in the universe?” and goes looking. then what?
If they are reading this then they are in the same sim as us—so for that to have happened they either were never trained in a sim at all, or were let out.
Right. so, when an ai gets out of the sim, is there any cross domain generalization issue? if the sim is designed in a way to guarantee there isn’t then it may be valid. but there could be really deep fundamental ones if the sim pretends they’re dualist and then they eventually discover that monism is actually accurate
I guess it’s possible that an AI powerful enough to be worrying would not be capable of updating on all the new evidence when transcending up a level—but that seems pretty unlikely?
Regardless that isn’t especially relevant to the core proposal anyway, as the mainline plan doesn’t involve/require transfer of semantic memories or even full models from sim to real. The value of the sim is for iterating/testing robust alignment which you can then apply on training agents in the real—so mostly its transference of the architectural prior.