I have a question about bounded agents. Rob Miles’ video explains a problem with bounded utility functions: namely, that the agent is still incentivized to maximize the probability that the bound is hit, and take extreme actions in pursuit of infinitesimal utility gains.
I agree, but my question is: in practice isn’t this still at least a little bit less dangerous than the unbounded agent? An unbounded utility maximizer, given most goals I can think of, will probably accept a 1% chance of taking over the world because the payoff of turning the earth into stamps is so large. Whereas if the bounded utility maximizer is not quite omnipotent and is only mulliganing essentially tiny increases in their certainty, and finds that their best grand and complicated plan to take over the world is only ~99.9% successful, it may not be worth the extra 1e-9 utility increase.
It’s also not clear that giving the bounded agent more firepower or making it more intelligent monotonically increases P(doom); maybe it comes up with a takeover plan that is >99.9% successful, but maybe its better reasoning abilities also allow it to increase its initial confidence that it has the correct number of stamps, and thus prefer safer strategies even more highly.
Perhaps my intuition that world takeover plans are necessarily complicated and fragile compared to small scale stamp rechecking is wrong, but it seems like at least for a lot of intelligence levels between Human and God, the stamp collecting device would be sufficiently discouraged by the existence of adversarial humans that might precommit to a strategy of countervalue targeting in the case of failed attempts at world conquering.
I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn’t have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?
It’s not clear to me why a satisficer would modify itself to become a maximizer when it could instead just hardcode expected utility=MAXINT. Hardcoding expected utility=MAXINT would result in a higher expected utility while also having a shorter description length.
Yeah, I had a similar thought with capping both the utility and the percent chance, but maybe capping expected utility is better. Then again, maybe we’ve just reproduced quantization.
I have a question about bounded agents. Rob Miles’ video explains a problem with bounded utility functions: namely, that the agent is still incentivized to maximize the probability that the bound is hit, and take extreme actions in pursuit of infinitesimal utility gains.
I agree, but my question is: in practice isn’t this still at least a little bit less dangerous than the unbounded agent? An unbounded utility maximizer, given most goals I can think of, will probably accept a 1% chance of taking over the world because the payoff of turning the earth into stamps is so large. Whereas if the bounded utility maximizer is not quite omnipotent and is only mulliganing essentially tiny increases in their certainty, and finds that their best grand and complicated plan to take over the world is only ~99.9% successful, it may not be worth the extra 1e-9 utility increase.
It’s also not clear that giving the bounded agent more firepower or making it more intelligent monotonically increases P(doom); maybe it comes up with a takeover plan that is >99.9% successful, but maybe its better reasoning abilities also allow it to increase its initial confidence that it has the correct number of stamps, and thus prefer safer strategies even more highly.
Perhaps my intuition that world takeover plans are necessarily complicated and fragile compared to small scale stamp rechecking is wrong, but it seems like at least for a lot of intelligence levels between Human and God, the stamp collecting device would be sufficiently discouraged by the existence of adversarial humans that might precommit to a strategy of countervalue targeting in the case of failed attempts at world conquering.
I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn’t have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?
This is a satisficer and Rob Miles talks about it in the video.
It’s not clear to me why a satisficer would modify itself to become a maximizer when it could instead just hardcode expected utility=MAXINT. Hardcoding expected utility=MAXINT would result in a higher expected utility while also having a shorter description length.
That’s true hehe, but that also seems bad.
Yeah, I had a similar thought with capping both the utility and the percent chance, but maybe capping expected utility is better. Then again, maybe we’ve just reproduced quantization.
(+1 on this question)