Well, right, but my point is that “the thing which differentially caused the sensory experiences to which I refer” does not refer to the same thing as “the thing which would differentially cause similar sensory experiences in the future, after you’ve made your changes,” and it’s possible to specify the former rather than the latter.
The AI can influence sensory experiences, but it can’t retroactively influence sensory experiences. (Or, well, perhaps it can, but that’s a whole new dimension of subversive. Similarly, I suppose a sufficiently powerful optimizer could rewrite the automaton rules in case #2, so perhaps we have a similar problem there as well.)
You need to describe the sensory experience as part of the AI’s utility computation somehow. I thought it would be something like a bitstring representing a brain scan, which can refer to future experiences just as easily as past ones. Do you propose to include a timestamp? But the universe doesn’t seem to have a global clock. Or do you propose to say something like “the values of such-and such terms in the utility computation must be unaffected by the AI’s actions”? But we don’t know how to define “unaffected” mathematically...
I was thinking in terms of referring to a brain. Or, rather, a set of them. But a sufficiently detailed brainscan would work just as well, I suppose.
And, sure, the universe doesn’t have a clock, but a clock isn’t needed, simply an ordering: the AI attends to evidence about sensory experiences that occurred before the AI received the instruction.
Of course, maybe it is incapable of figuring out whether a given sensory experience occurred before it received the instruction… it’s just not smart enough. Or maybe the universe is weirder than I imagine, such that the order in which two events occur is not something the AI and I can actually agree on… which is the same case as “perhaps it can in fact retroactively influence sensory experiences” above.
Well, right, but my point is that “the thing which differentially caused the sensory experiences to which I refer” does not refer to the same thing as “the thing which would differentially cause similar sensory experiences in the future, after you’ve made your changes,” and it’s possible to specify the former rather than the latter.
The AI can influence sensory experiences, but it can’t retroactively influence sensory experiences. (Or, well, perhaps it can, but that’s a whole new dimension of subversive. Similarly, I suppose a sufficiently powerful optimizer could rewrite the automaton rules in case #2, so perhaps we have a similar problem there as well.)
You need to describe the sensory experience as part of the AI’s utility computation somehow. I thought it would be something like a bitstring representing a brain scan, which can refer to future experiences just as easily as past ones. Do you propose to include a timestamp? But the universe doesn’t seem to have a global clock. Or do you propose to say something like “the values of such-and such terms in the utility computation must be unaffected by the AI’s actions”? But we don’t know how to define “unaffected” mathematically...
I was thinking in terms of referring to a brain. Or, rather, a set of them. But a sufficiently detailed brainscan would work just as well, I suppose.
And, sure, the universe doesn’t have a clock, but a clock isn’t needed, simply an ordering: the AI attends to evidence about sensory experiences that occurred before the AI received the instruction.
Of course, maybe it is incapable of figuring out whether a given sensory experience occurred before it received the instruction… it’s just not smart enough. Or maybe the universe is weirder than I imagine, such that the order in which two events occur is not something the AI and I can actually agree on… which is the same case as “perhaps it can in fact retroactively influence sensory experiences” above.