I’m not convinced that relativity is really a problem: it looks to me like you can probably deal with it as follows. Instead of asking about the state of the universe at time T and making T one parameter in the optimization, ask about the state of the universe within a spacetime region including O (where O is a starting-point somewhere around where the AI is to start operating) where now that region is a parameter in the optimization. Then instead of λT, use λ times some measure of the size of that region. (You might use something like total computation done within the region but that might be hard to define and as OP suggests it might not penalize everything you care about.) You might actually want to use the size of the boundary rather than of the region itself in your regularization term, to discourage gerrymandering. (Which might also make some sense in terms of physics because something something holographic principle something something, but that’s handwavy motivation at best.)
Of course, optimizing over the exact extent of a more-or-less-arbitrary region of spacetime is much more work than optimizing over a single scalar parameter. But in the context we’re looking at, you’re already optimizing over an absurdly large space: that of all possible courses of action the AI could take.
I’m not convinced that relativity is really a problem: it looks to me like you can probably deal with it as follows. Instead of asking about the state of the universe at time T and making T one parameter in the optimization, ask about the state of the universe within a spacetime region including O (where O is a starting-point somewhere around where the AI is to start operating) where now that region is a parameter in the optimization. Then instead of λT, use λ times some measure of the size of that region. (You might use something like total computation done within the region but that might be hard to define and as OP suggests it might not penalize everything you care about.) You might actually want to use the size of the boundary rather than of the region itself in your regularization term, to discourage gerrymandering. (Which might also make some sense in terms of physics because something something holographic principle something something, but that’s handwavy motivation at best.)
Of course, optimizing over the exact extent of a more-or-less-arbitrary region of spacetime is much more work than optimizing over a single scalar parameter. But in the context we’re looking at, you’re already optimizing over an absurdly large space: that of all possible courses of action the AI could take.