note that HAL failed in its mission through being TOO “human”: it had a nervous breakdown. Bad engineering.
This is very similar to my opinion. I’m tempted to update my confidence upwards, but this hinges on the practical difference between narrow and general AI, and I’m not confident that Pat Hayes has thought about that assumption enough.
How much do you think I should adjust my confidence? (To be clear, my opinion is “AIs should have narrow, bounded goals constrained to their circle of competence; giving an AI emotions or misinterpretable goals is a recipe for disaster.” On actually writing it out, it seems similar to the SIAI position except they and I may differ on how practical it is for goals to be narrow and bounded.)
Narrow goals don’t imply limited influence. If you strive to maintain constant level of illumination on a street corner, you might dislike that pesky Sun going up and down every day.
The engineer in me finds the idea of “constant level of illumination” entirely unnatural, and would first start off with something like “within a broad but serviceable band.” And so I would not be surprised to see street lamps that double as parasols in the future, but would be surprised to see a street lamp plotting to destroy the sun.
The engineer in me finds the idea of “constant level of illumination” entirely unnatural, and would first start off with something like “within a broad but serviceable band.” And so I would not be surprised to see street lamps that double as parasols in the future, but would be surprised to see a street lamp plotting to destroy the sun.
You (would) have just sentenced humanity to extinction and incidentally burned the entire cosmic commons. Oops.
If a general intelligence has been given a narrow goal then it will devote itself to achieving that goal and everything else is irrelevant. In this case the most pressing threat to it’s prescribed utility is the possibility that the light management system (itself) or even the actual road will be decommissioned by the humans. Nevermind the long term consideration that the humans are squandering valuable energy that will be required for future lighting purposes. A week later humans are no more.
The rest of the accessible universe is, of course, nothing but a potential risk (other life forms, asteroids and suchlike) and a source of resources. Harvesting it and controlling it are the next pressing concern. Then there is the simple matter of conserving resources as efficiently as possible so that the lighting can be maintained.
The rest of the universe has been obliterated for all intents and purposes (except street lighting) but you can rest assured that the street will be lit at the lower end of the acceptable bound for the next trillion years.
You (would) have just sentenced humanity to extinction and incidentally burned the entire cosmic commons. Oops.
So, I’ve heard this argument before, and every time I hear it I like this introduction less and less. I feel like it puts me on the defensive and assumes what seems like an unreasonable level of incaution.
Suppose the utility function is something like F(lumens at detector)-G(resources used). F plateaus in the optimal part of the band, then smoothly decreases on either side, and probably considers possible ways for the detectors to malfunction or be occluded. (There would probably be several photodiodes around the street corner.) F also only accumulates for the next 5 years, as we expect to reevaluate the system in 5 years. G is some convex function of some measure of resources, which might be smooth or might shoot up at some level we think is far above reasonable.
And so the system does resist premature decommissioning (as that’s more likely to be hostile than authorized), worry about asteroids, and so on, but it’s cognizant of its resource budget (really, increasing marginal cost of resources) and so stops worrying about something once if it doesn’t expect cost-effective countermeasures (because worry consumes resources!). Even if it has a plan that’s guaranteed of success, it might not use that plan because the resource cost would be higher than the expected lighting gains over its remaining lifespan.
I don’t think I’ve seen an plausible argument that a moderately well-designed satisficer will destroy humanity, though I agree that even a very well-designed maximizer has an unacceptably high chance of destroying humanity. I’m curious, though, and willing to listen to any arguments about satisficers.
I’ve thought along somewhat similar lines of ‘resource budget’ before, and can’t find anything obviously wrong with that argument. That is possibly because I haven’t quite defined ‘resources’. Still seems like an obvious containment strategy, I wonder if it’s been discussed here already.
I’ve thought along somewhat similar lines of ‘resource budget’ before, and can’t find anything obviously wrong with that argument. That is possibly because I haven’t quite defined ‘resources’.
The AI danger crowd seems happy to assume that the AI wants to maximize its available free energy, so I would assume they’re similarly happy to assume the AI can measure its available free energy. I do agree that this is a potential sticking point, though, as it needs to price resources correctly, which may be vulnerable to tampering.
This is a good point. I’d like to eat something tasty each day, and I know that my chances of being successful at that would be improved if I made myself the dictator of the Earth. But currently there are far easier ways of making sure that I get something to eat each day, so I don’t bother with the dicator scheme with all of its associated risks.
Of course, there are various counter-arguments to this. (Some minds might have a much easier time of taking over the world, or perhaps more realistically, might seriously underestimate the difficulty of doing so.)
It seems like this would work for cases where there is little variation in maximally achievable F and the resource cost is high, however I suspect that if there is more uncertainty there is more room for problems to arise (especially if the cost of thinking is low relative to the overall resource use, or something like that).
For example, imagine the AI decides that it needs to minimize G. So, it iterates on itself to make itself more intelligent, plays the stock market, makes a lot of money, buys a generator to reduce its own thought cost to zero, then proceeds to take over the world and all that good stuff to make sure that no one messes with all the generators it sticks on all the lamps (alternatively, if the resource cost is monitored internally, it has a duplicate of itself built without this monitor). Now, in this particular case you might be able to plausibly argue that the resource cost of all the thinking would make it not worth it, however it’s not clear that this would be the case for any realistic scale projects. (Although it’s possible that I just abused the one minimization-like part you accidentally left in there and there is some relatively simple patch that I’m not seeing.)
Although it’s possible that I just abused the one minimization-like part you accidentally left in there and there is some relatively simple patch that I’m not seeing.
I meant “resources used” in the sense of “resources directed towards this goal” rather than “resources drawn from the metropolitian utility company”- if the streetlamps play the stock market and accumulate a bunch of money, spending that money will still decrease their utility, and so unless they can spend the money in a way that improves the illumination cost-effectively they won’t.
Now, defining “resources directed towards this goal” in a way that’s machine-understandable is a hard problem. But if we already have an AI that thinks causally- such that it can actually make these plans and enact them- then it seems to me like that problem has already been solved.
Hm, all right, fair enough. That actually sounds plausible, assuming we can be sure that the AI appropriately takes account of something vaguely along the lines of “all resources that will be used in relation to this problem”, including, for example, creating a copy of itself that does not care about resources used and obfuscates its activities from the original. Which will probably be doable at that point.
This is very similar to my opinion. I’m tempted to update my confidence upwards, but this hinges on the practical difference between narrow and general AI, and I’m not confident that Pat Hayes has thought about that assumption enough.
How much do you think I should adjust my confidence? (To be clear, my opinion is “AIs should have narrow, bounded goals constrained to their circle of competence; giving an AI emotions or misinterpretable goals is a recipe for disaster.” On actually writing it out, it seems similar to the SIAI position except they and I may differ on how practical it is for goals to be narrow and bounded.)
Narrow goals don’t imply limited influence. If you strive to maintain constant level of illumination on a street corner, you might dislike that pesky Sun going up and down every day.
The engineer in me finds the idea of “constant level of illumination” entirely unnatural, and would first start off with something like “within a broad but serviceable band.” And so I would not be surprised to see street lamps that double as parasols in the future, but would be surprised to see a street lamp plotting to destroy the sun.
You (would) have just sentenced humanity to extinction and incidentally burned the entire cosmic commons. Oops.
If a general intelligence has been given a narrow goal then it will devote itself to achieving that goal and everything else is irrelevant. In this case the most pressing threat to it’s prescribed utility is the possibility that the light management system (itself) or even the actual road will be decommissioned by the humans. Nevermind the long term consideration that the humans are squandering valuable energy that will be required for future lighting purposes. A week later humans are no more.
The rest of the accessible universe is, of course, nothing but a potential risk (other life forms, asteroids and suchlike) and a source of resources. Harvesting it and controlling it are the next pressing concern. Then there is the simple matter of conserving resources as efficiently as possible so that the lighting can be maintained.
The rest of the universe has been obliterated for all intents and purposes (except street lighting) but you can rest assured that the street will be lit at the lower end of the acceptable bound for the next trillion years.
So, I’ve heard this argument before, and every time I hear it I like this introduction less and less. I feel like it puts me on the defensive and assumes what seems like an unreasonable level of incaution.
Suppose the utility function is something like F(lumens at detector)-G(resources used). F plateaus in the optimal part of the band, then smoothly decreases on either side, and probably considers possible ways for the detectors to malfunction or be occluded. (There would probably be several photodiodes around the street corner.) F also only accumulates for the next 5 years, as we expect to reevaluate the system in 5 years. G is some convex function of some measure of resources, which might be smooth or might shoot up at some level we think is far above reasonable.
And so the system does resist premature decommissioning (as that’s more likely to be hostile than authorized), worry about asteroids, and so on, but it’s cognizant of its resource budget (really, increasing marginal cost of resources) and so stops worrying about something once if it doesn’t expect cost-effective countermeasures (because worry consumes resources!). Even if it has a plan that’s guaranteed of success, it might not use that plan because the resource cost would be higher than the expected lighting gains over its remaining lifespan.
I don’t think I’ve seen an plausible argument that a moderately well-designed satisficer will destroy humanity, though I agree that even a very well-designed maximizer has an unacceptably high chance of destroying humanity. I’m curious, though, and willing to listen to any arguments about satisficers.
I’ve thought along somewhat similar lines of ‘resource budget’ before, and can’t find anything obviously wrong with that argument. That is possibly because I haven’t quite defined ‘resources’. Still seems like an obvious containment strategy, I wonder if it’s been discussed here already.
The AI danger crowd seems happy to assume that the AI wants to maximize its available free energy, so I would assume they’re similarly happy to assume the AI can measure its available free energy. I do agree that this is a potential sticking point, though, as it needs to price resources correctly, which may be vulnerable to tampering.
This is a good point. I’d like to eat something tasty each day, and I know that my chances of being successful at that would be improved if I made myself the dictator of the Earth. But currently there are far easier ways of making sure that I get something to eat each day, so I don’t bother with the dicator scheme with all of its associated risks.
Of course, there are various counter-arguments to this. (Some minds might have a much easier time of taking over the world, or perhaps more realistically, might seriously underestimate the difficulty of doing so.)
It seems like this would work for cases where there is little variation in maximally achievable F and the resource cost is high, however I suspect that if there is more uncertainty there is more room for problems to arise (especially if the cost of thinking is low relative to the overall resource use, or something like that).
For example, imagine the AI decides that it needs to minimize G. So, it iterates on itself to make itself more intelligent, plays the stock market, makes a lot of money, buys a generator to reduce its own thought cost to zero, then proceeds to take over the world and all that good stuff to make sure that no one messes with all the generators it sticks on all the lamps (alternatively, if the resource cost is monitored internally, it has a duplicate of itself built without this monitor). Now, in this particular case you might be able to plausibly argue that the resource cost of all the thinking would make it not worth it, however it’s not clear that this would be the case for any realistic scale projects. (Although it’s possible that I just abused the one minimization-like part you accidentally left in there and there is some relatively simple patch that I’m not seeing.)
I meant “resources used” in the sense of “resources directed towards this goal” rather than “resources drawn from the metropolitian utility company”- if the streetlamps play the stock market and accumulate a bunch of money, spending that money will still decrease their utility, and so unless they can spend the money in a way that improves the illumination cost-effectively they won’t.
Now, defining “resources directed towards this goal” in a way that’s machine-understandable is a hard problem. But if we already have an AI that thinks causally- such that it can actually make these plans and enact them- then it seems to me like that problem has already been solved.
Hm, all right, fair enough. That actually sounds plausible, assuming we can be sure that the AI appropriately takes account of something vaguely along the lines of “all resources that will be used in relation to this problem”, including, for example, creating a copy of itself that does not care about resources used and obfuscates its activities from the original. Which will probably be doable at that point.