I agree that the interconnectedness of physical reality will leave traces—the question is: Enough? Can we put bounds on that?
I imagine blowing up a lot of stuff at once will destroy more than you can recover from elsewhere.
I am somewhat certain preDCA requires a specific human but there should be enough information recorded about anyone with a large enough digital footprint to reconstruct a plausible simulacra of them.
Keep in mind the ultimate goal is to get a good understanding of their preferences, not to actually recreate their entire existence with perfect fidelity.
PreDCA requires a human “user” to “be in the room” so that it is correctly identified as the “user”, but then only infers their utility from the actions they took before the AGI existed. This is achieved by inspecting the world model (which includes the past) on which the AGI converges. That is, the AGI is not “looking for traces of this person in the past”. It is reconstructing the whole past (and afterwards seeing what that person did there). Allegedly, if capabilities are high enough (to be dangerous), it will be able to reconstruct the past pretty accurately.
I guess the default answer would be that this is a problem for (the physical possibility of certain) capabilities, and we are usually only concerned with our Alignment proposal working in the limit of high capabilities. Not (only) because we might think these capabilities will be achieved, but because any less capable system will a priori be less dangerous: it is way more likely that its capabilities fail in some non-interesting way (non-related to Alignment), or affect many other aspects of its performance (rendering it unable to achieve dangerous instrumental goals), than for capabilities to fail in just the right way so as for most of its potential achievements to remain untouched, but the goal relevantly altered. In your example, if our model truly can’t converge with moderate accuracy to the right world model, we’d expect it to not have a clear understanding of the world around it, and so for instance be easily turned off.
That said, it might be interesting to more seriously consider whether efficient prediction of the past being literally physically impossible could make PreDCA slightly more dangerous for super-capable systems.
I agree that the interconnectedness of physical reality will leave traces—the question is: Enough? Can we put bounds on that? I imagine blowing up a lot of stuff at once will destroy more than you can recover from elsewhere.
I am somewhat certain preDCA requires a specific human but there should be enough information recorded about anyone with a large enough digital footprint to reconstruct a plausible simulacra of them.
Keep in mind the ultimate goal is to get a good understanding of their preferences, not to actually recreate their entire existence with perfect fidelity.
PreDCA requires a human “user” to “be in the room” so that it is correctly identified as the “user”, but then only infers their utility from the actions they took before the AGI existed. This is achieved by inspecting the world model (which includes the past) on which the AGI converges. That is, the AGI is not “looking for traces of this person in the past”. It is reconstructing the whole past (and afterwards seeing what that person did there). Allegedly, if capabilities are high enough (to be dangerous), it will be able to reconstruct the past pretty accurately.
I guess the default answer would be that this is a problem for (the physical possibility of certain) capabilities, and we are usually only concerned with our Alignment proposal working in the limit of high capabilities. Not (only) because we might think these capabilities will be achieved, but because any less capable system will a priori be less dangerous: it is way more likely that its capabilities fail in some non-interesting way (non-related to Alignment), or affect many other aspects of its performance (rendering it unable to achieve dangerous instrumental goals), than for capabilities to fail in just the right way so as for most of its potential achievements to remain untouched, but the goal relevantly altered. In your example, if our model truly can’t converge with moderate accuracy to the right world model, we’d expect it to not have a clear understanding of the world around it, and so for instance be easily turned off.
That said, it might be interesting to more seriously consider whether efficient prediction of the past being literally physically impossible could make PreDCA slightly more dangerous for super-capable systems.
Thanks for the long answer. I agree that my question is likely more tangential.