Doesn’t this argument also work against the idea that they would self-modify in the “normal” finite way? It can’t currently represent the number which it’s building a ton of new storage to help contain, so it can’t make a pairwise comparison to say the latter is better, nor can it simulate the outcome of doing this and predict the reward it would get
Maybe you say it’s not directly making a pairwise comparison but making a more abstract step of reasoning like “I can’t predict that number but I know it’s gonna be bigger that what I have now, me with augmented memory will still be aligned with me in terms of its ranking everything the same way I rank it. but will in retrospect think this was a good idea so I trust it”. But then analogously it seems like it can make a similar argument for modifying itself to represent infinite values even
Or more plausibly you say however the AI is representing numbers it’s not in these naive way where it can only do things with numbers it can fit inside its head. But then it seems like you’re back at having a representation that’ll allow it to set its reward to whatever number it wants without going and taking over anything
Doesn’t this argument also work against the idea that they would self-modify in the “normal” finite way? It can’t currently represent the number which it’s building a ton of new storage to help contain, so it can’t make a pairwise comparison to say the latter is better, nor can it simulate the outcome of doing this and predict the reward it would get
Maybe you say it’s not directly making a pairwise comparison but making a more abstract step of reasoning like “I can’t predict that number but I know it’s gonna be bigger that what I have now, me with augmented memory will still be aligned with me in terms of its ranking everything the same way I rank it. but will in retrospect think this was a good idea so I trust it”. But then analogously it seems like it can make a similar argument for modifying itself to represent infinite values even
Or more plausibly you say however the AI is representing numbers it’s not in these naive way where it can only do things with numbers it can fit inside its head. But then it seems like you’re back at having a representation that’ll allow it to set its reward to whatever number it wants without going and taking over anything