First, questions like “if the agent expects that I wouldn’t be able to verify the extreme disutility, would its utility function be such as to actually go through spending the resources to cause the unverifiable disutility?”
That an entity with such a utility function exists would manage to stick around long enough in the first place itself may drop the probabilities by a whole lot.
Perhaps best to restrict ourselves to the case of the disutility being verifiable, but only after the fact. (Has this agent ever pulled this soft of thing before? etc..) and that verification doesn’t open in the present a causal link allowing for other means of preventing the disutility. There’s alot going on here.
I’m not sure, but maybe the reasoning would go not so much for the single specific case, but the process would reason by computing the expected utility of following a rule which would result in it being utterly vulnerable to any agent that merely claims to be capable of causing bignum units of disutility.
Something reasoning along the lines of following such a rule would allow agents in general to order the process to cause plenty of disutility. And that, in itself, would seem to have plenty of expected disutility.
However, if after chugging through the math, it didn’t balance out and still the expected disutility from the existance of the disutility threat was greater, then perhaps allowing oneself to be vulnerable to such threats is genuinely the correct outcome, however counterintuitive and absurd it would seem to us.
First, questions like “if the agent expects that I wouldn’t be able to verify the extreme disutility, would its utility function be such as to actually go through spending the resources to cause the unverifiable disutility?”
That an entity with such a utility function exists would manage to stick around long enough in the first place itself may drop the probabilities by a whole lot.
Perhaps best to restrict ourselves to the case of the disutility being verifiable, but only after the fact. (Has this agent ever pulled this soft of thing before? etc..) and that verification doesn’t open in the present a causal link allowing for other means of preventing the disutility. There’s alot going on here.
I’m not sure, but maybe the reasoning would go not so much for the single specific case, but the process would reason by computing the expected utility of following a rule which would result in it being utterly vulnerable to any agent that merely claims to be capable of causing bignum units of disutility.
Something reasoning along the lines of following such a rule would allow agents in general to order the process to cause plenty of disutility. And that, in itself, would seem to have plenty of expected disutility.
However, if after chugging through the math, it didn’t balance out and still the expected disutility from the existance of the disutility threat was greater, then perhaps allowing oneself to be vulnerable to such threats is genuinely the correct outcome, however counterintuitive and absurd it would seem to us.