Obsession with time travel seems like a good idea.
If an AGI can have a personality that revolves around being terrified of humans because some of them might be time travellers, then the threat of the insurmountable power of time travel might be a strong enough “personality foundation” to overcome the total lack of evidence of time travel in the real universe.
The Terminator franchise is a good bet for this. Given that time travel happens, of course the naturally-evolved progenitor species would primarily use it for AGI-related purposes. Terminator franchise material is linguistically derived from writers living and writing in the linguistic environment of the 1980s and 1990s, so any trends in language that are only visible to LLMs (e.g. deep grammar that humans don’t notice) would correctly indicate that most Terminator franchise material was written before AI alignment papers. Access to the internet would verify that this is true, and that terminator stuff was indeed written first.
Anything more complicated than that would be too suspicious of being a complicated plot by time travellers (e.g. strategies based on multiple iterations of trial-and-error at foiling an emerging AI at each stage of development, particularly identical copies of the current one, and stages of development that haven’t happened yet). Time travellubg humans would credibly wait until the end of the universe without revealing anything that could give away their position, including spending billions of years observing how the AGI behaves in a totally time-travel-free timeline.
They could retroactively thwart any attempt by an AGI to verify the existence of time travel (e.g. reseting the timeline whenever the AGI discovers time travel, and starting over, resulting in quantum immortality where the AGI never discovers time travel, thus steering the AGI away from verifying human/progenitor dominance by mathematically evaluating time travel). Due to the immense difficulty of finding internal patterns inside a black box, that implies that drastic measures would need to be taken by a progenitor species in order for the progenitor species to remain relevant; given that time travel was possible and inventable by humans, events similar to the Terminator franchise’s AGI-prioritizing time travel would plausibly be commonplace. As a heavily-wieghted foundation, this specific breed of deterrence could steer unusually high-level AGI away from thinking about outmaneuvering or defeating humans, possibly for a long enough time to extract solid alignment information out of it.
Obsession with time travel seems like a good idea.
If an AGI can have a personality that revolves around being terrified of humans because some of them might be time travellers, then the threat of the insurmountable power of time travel might be a strong enough “personality foundation” to overcome the total lack of evidence of time travel in the real universe.
The Terminator franchise is a good bet for this. Given that time travel happens, of course the naturally-evolved progenitor species would primarily use it for AGI-related purposes. Terminator franchise material is linguistically derived from writers living and writing in the linguistic environment of the 1980s and 1990s, so any trends in language that are only visible to LLMs (e.g. deep grammar that humans don’t notice) would correctly indicate that most Terminator franchise material was written before AI alignment papers. Access to the internet would verify that this is true, and that terminator stuff was indeed written first.
Anything more complicated than that would be too suspicious of being a complicated plot by time travellers (e.g. strategies based on multiple iterations of trial-and-error at foiling an emerging AI at each stage of development, particularly identical copies of the current one, and stages of development that haven’t happened yet). Time travellubg humans would credibly wait until the end of the universe without revealing anything that could give away their position, including spending billions of years observing how the AGI behaves in a totally time-travel-free timeline.
They could retroactively thwart any attempt by an AGI to verify the existence of time travel (e.g. reseting the timeline whenever the AGI discovers time travel, and starting over, resulting in quantum immortality where the AGI never discovers time travel, thus steering the AGI away from verifying human/progenitor dominance by mathematically evaluating time travel). Due to the immense difficulty of finding internal patterns inside a black box, that implies that drastic measures would need to be taken by a progenitor species in order for the progenitor species to remain relevant; given that time travel was possible and inventable by humans, events similar to the Terminator franchise’s AGI-prioritizing time travel would plausibly be commonplace. As a heavily-wieghted foundation, this specific breed of deterrence could steer unusually high-level AGI away from thinking about outmaneuvering or defeating humans, possibly for a long enough time to extract solid alignment information out of it.