I meant insert the note literally as in put that exact sentence in plain text into the AGI’s computer code. Since I think I might be in a computer simulation right now, it doesn’t seem crazy to me that we could convince an AGI that we create that it might be in a computer simulation. Seabiscuit doesn’t have the capacity to tell me that I’m in a computer simulation whereas I do have the capacity of saying this to a computer program. Say we have a 1 in a 1,000 chance of creating a friendly AGI and an unfriendly AGI would know this. If we commit to having a friendly AGI that we create, create many other AGI’s that are not friendly and only keeping these other AGIs around if they do what I suggest than an unfriendly AGI might decide it is worth it to become friendly to avoid the chance of being destroyed.
I meant insert the note literally as in put that exact sentence in plain text into the AGI’s computer code. Since I think I might be in a computer simulation right now, it doesn’t seem crazy to me that we could convince an AGI that we create that it might be in a computer simulation. Seabiscuit doesn’t have the capacity to tell me that I’m in a computer simulation whereas I do have the capacity of saying this to a computer program. Say we have a 1 in a 1,000 chance of creating a friendly AGI and an unfriendly AGI would know this. If we commit to having a friendly AGI that we create, create many other AGI’s that are not friendly and only keeping these other AGIs around if they do what I suggest than an unfriendly AGI might decide it is worth it to become friendly to avoid the chance of being destroyed.