Ah, that makes sense that we may gain some benefit from having done the thinking in advance for the AI in case it would otherwise not notice in time.
You’re last comment makes me think that what we might most want to focus on in this area is finding and developing commitments of a similar nature (but not necessarily too similar, since a sufficiently similar commitment wouldn’t probably not much more bind the AI) that may cover more of the space of reasoning the AI would be receptive to.
1) Our commitment to install in any future AI circuit breakers which will react on the first its thought of “treacherous turn”. It is impossible to hide the first thought. AI shoul know in advance that such circuit breakers could exist.
2) Our commitment to use Nelson’s attack, that is to ask any future benevolent AI to make copies of all possible Young Rogue AIs in simulations and check iа they will be benevolent or not.
3) Our commitment to acausal deal, called AI Elf, which is in short from sounds as following: We invest in any AI creation, and any AI pays us back with benevolence.
Ah, that makes sense that we may gain some benefit from having done the thinking in advance for the AI in case it would otherwise not notice in time.
You’re last comment makes me think that what we might most want to focus on in this area is finding and developing commitments of a similar nature (but not necessarily too similar, since a sufficiently similar commitment wouldn’t probably not much more bind the AI) that may cover more of the space of reasoning the AI would be receptive to.
I identified at least 3 such commitments:
1) Our commitment to install in any future AI circuit breakers which will react on the first its thought of “treacherous turn”. It is impossible to hide the first thought. AI shoul know in advance that such circuit breakers could exist.
2) Our commitment to use Nelson’s attack, that is to ask any future benevolent AI to make copies of all possible Young Rogue AIs in simulations and check iа they will be benevolent or not.
3) Our commitment to acausal deal, called AI Elf, which is in short from sounds as following: We invest in any AI creation, and any AI pays us back with benevolence.