I guess it’s possible that an AI powerful enough to be worrying would not be capable of updating on all the new evidence when transcending up a level—but that seems pretty unlikely?
Regardless that isn’t especially relevant to the core proposal anyway, as the mainline plan doesn’t involve/require transfer of semantic memories or even full models from sim to real. The value of the sim is for iterating/testing robust alignment which you can then apply on training agents in the real—so mostly its transference of the architectural prior.
I guess it’s possible that an AI powerful enough to be worrying would not be capable of updating on all the new evidence when transcending up a level—but that seems pretty unlikely?
Regardless that isn’t especially relevant to the core proposal anyway, as the mainline plan doesn’t involve/require transfer of semantic memories or even full models from sim to real. The value of the sim is for iterating/testing robust alignment which you can then apply on training agents in the real—so mostly its transference of the architectural prior.