You (would) have just sentenced humanity to extinction and incidentally burned the entire cosmic commons. Oops.
So, I’ve heard this argument before, and every time I hear it I like this introduction less and less. I feel like it puts me on the defensive and assumes what seems like an unreasonable level of incaution.
Suppose the utility function is something like F(lumens at detector)-G(resources used). F plateaus in the optimal part of the band, then smoothly decreases on either side, and probably considers possible ways for the detectors to malfunction or be occluded. (There would probably be several photodiodes around the street corner.) F also only accumulates for the next 5 years, as we expect to reevaluate the system in 5 years. G is some convex function of some measure of resources, which might be smooth or might shoot up at some level we think is far above reasonable.
And so the system does resist premature decommissioning (as that’s more likely to be hostile than authorized), worry about asteroids, and so on, but it’s cognizant of its resource budget (really, increasing marginal cost of resources) and so stops worrying about something once if it doesn’t expect cost-effective countermeasures (because worry consumes resources!). Even if it has a plan that’s guaranteed of success, it might not use that plan because the resource cost would be higher than the expected lighting gains over its remaining lifespan.
I don’t think I’ve seen an plausible argument that a moderately well-designed satisficer will destroy humanity, though I agree that even a very well-designed maximizer has an unacceptably high chance of destroying humanity. I’m curious, though, and willing to listen to any arguments about satisficers.
I’ve thought along somewhat similar lines of ‘resource budget’ before, and can’t find anything obviously wrong with that argument. That is possibly because I haven’t quite defined ‘resources’. Still seems like an obvious containment strategy, I wonder if it’s been discussed here already.
I’ve thought along somewhat similar lines of ‘resource budget’ before, and can’t find anything obviously wrong with that argument. That is possibly because I haven’t quite defined ‘resources’.
The AI danger crowd seems happy to assume that the AI wants to maximize its available free energy, so I would assume they’re similarly happy to assume the AI can measure its available free energy. I do agree that this is a potential sticking point, though, as it needs to price resources correctly, which may be vulnerable to tampering.
This is a good point. I’d like to eat something tasty each day, and I know that my chances of being successful at that would be improved if I made myself the dictator of the Earth. But currently there are far easier ways of making sure that I get something to eat each day, so I don’t bother with the dicator scheme with all of its associated risks.
Of course, there are various counter-arguments to this. (Some minds might have a much easier time of taking over the world, or perhaps more realistically, might seriously underestimate the difficulty of doing so.)
It seems like this would work for cases where there is little variation in maximally achievable F and the resource cost is high, however I suspect that if there is more uncertainty there is more room for problems to arise (especially if the cost of thinking is low relative to the overall resource use, or something like that).
For example, imagine the AI decides that it needs to minimize G. So, it iterates on itself to make itself more intelligent, plays the stock market, makes a lot of money, buys a generator to reduce its own thought cost to zero, then proceeds to take over the world and all that good stuff to make sure that no one messes with all the generators it sticks on all the lamps (alternatively, if the resource cost is monitored internally, it has a duplicate of itself built without this monitor). Now, in this particular case you might be able to plausibly argue that the resource cost of all the thinking would make it not worth it, however it’s not clear that this would be the case for any realistic scale projects. (Although it’s possible that I just abused the one minimization-like part you accidentally left in there and there is some relatively simple patch that I’m not seeing.)
Although it’s possible that I just abused the one minimization-like part you accidentally left in there and there is some relatively simple patch that I’m not seeing.
I meant “resources used” in the sense of “resources directed towards this goal” rather than “resources drawn from the metropolitian utility company”- if the streetlamps play the stock market and accumulate a bunch of money, spending that money will still decrease their utility, and so unless they can spend the money in a way that improves the illumination cost-effectively they won’t.
Now, defining “resources directed towards this goal” in a way that’s machine-understandable is a hard problem. But if we already have an AI that thinks causally- such that it can actually make these plans and enact them- then it seems to me like that problem has already been solved.
Hm, all right, fair enough. That actually sounds plausible, assuming we can be sure that the AI appropriately takes account of something vaguely along the lines of “all resources that will be used in relation to this problem”, including, for example, creating a copy of itself that does not care about resources used and obfuscates its activities from the original. Which will probably be doable at that point.
So, I’ve heard this argument before, and every time I hear it I like this introduction less and less. I feel like it puts me on the defensive and assumes what seems like an unreasonable level of incaution.
Suppose the utility function is something like F(lumens at detector)-G(resources used). F plateaus in the optimal part of the band, then smoothly decreases on either side, and probably considers possible ways for the detectors to malfunction or be occluded. (There would probably be several photodiodes around the street corner.) F also only accumulates for the next 5 years, as we expect to reevaluate the system in 5 years. G is some convex function of some measure of resources, which might be smooth or might shoot up at some level we think is far above reasonable.
And so the system does resist premature decommissioning (as that’s more likely to be hostile than authorized), worry about asteroids, and so on, but it’s cognizant of its resource budget (really, increasing marginal cost of resources) and so stops worrying about something once if it doesn’t expect cost-effective countermeasures (because worry consumes resources!). Even if it has a plan that’s guaranteed of success, it might not use that plan because the resource cost would be higher than the expected lighting gains over its remaining lifespan.
I don’t think I’ve seen an plausible argument that a moderately well-designed satisficer will destroy humanity, though I agree that even a very well-designed maximizer has an unacceptably high chance of destroying humanity. I’m curious, though, and willing to listen to any arguments about satisficers.
I’ve thought along somewhat similar lines of ‘resource budget’ before, and can’t find anything obviously wrong with that argument. That is possibly because I haven’t quite defined ‘resources’. Still seems like an obvious containment strategy, I wonder if it’s been discussed here already.
The AI danger crowd seems happy to assume that the AI wants to maximize its available free energy, so I would assume they’re similarly happy to assume the AI can measure its available free energy. I do agree that this is a potential sticking point, though, as it needs to price resources correctly, which may be vulnerable to tampering.
This is a good point. I’d like to eat something tasty each day, and I know that my chances of being successful at that would be improved if I made myself the dictator of the Earth. But currently there are far easier ways of making sure that I get something to eat each day, so I don’t bother with the dicator scheme with all of its associated risks.
Of course, there are various counter-arguments to this. (Some minds might have a much easier time of taking over the world, or perhaps more realistically, might seriously underestimate the difficulty of doing so.)
It seems like this would work for cases where there is little variation in maximally achievable F and the resource cost is high, however I suspect that if there is more uncertainty there is more room for problems to arise (especially if the cost of thinking is low relative to the overall resource use, or something like that).
For example, imagine the AI decides that it needs to minimize G. So, it iterates on itself to make itself more intelligent, plays the stock market, makes a lot of money, buys a generator to reduce its own thought cost to zero, then proceeds to take over the world and all that good stuff to make sure that no one messes with all the generators it sticks on all the lamps (alternatively, if the resource cost is monitored internally, it has a duplicate of itself built without this monitor). Now, in this particular case you might be able to plausibly argue that the resource cost of all the thinking would make it not worth it, however it’s not clear that this would be the case for any realistic scale projects. (Although it’s possible that I just abused the one minimization-like part you accidentally left in there and there is some relatively simple patch that I’m not seeing.)
I meant “resources used” in the sense of “resources directed towards this goal” rather than “resources drawn from the metropolitian utility company”- if the streetlamps play the stock market and accumulate a bunch of money, spending that money will still decrease their utility, and so unless they can spend the money in a way that improves the illumination cost-effectively they won’t.
Now, defining “resources directed towards this goal” in a way that’s machine-understandable is a hard problem. But if we already have an AI that thinks causally- such that it can actually make these plans and enact them- then it seems to me like that problem has already been solved.
Hm, all right, fair enough. That actually sounds plausible, assuming we can be sure that the AI appropriately takes account of something vaguely along the lines of “all resources that will be used in relation to this problem”, including, for example, creating a copy of itself that does not care about resources used and obfuscates its activities from the original. Which will probably be doable at that point.