I don’t think it would be unreasonable to say, “Consuming no more than 100 kWh of energy (gross, total), answer the following question: …” I can’t think of any really dangerous attacks that this leaves open, but I undoubtedly am missing some.
Since energy cannot be created or destroyed, and has a strong tendency to spread out every which way through surrounding space, you have to be really careful how you draw the boundaries around what counts as “consumed”. Solving that problem might be equivalent to solving Friendliness in general.
Do you think this is a loophole allowing arbitrary actions? Or do you think that an AI would simply say, “I don’t know what it means for energy to be consumed, so I’m not going to do anything.”
I don’t know much about physics, but do you think that some sort of measure of entropy might work better?
As far as I know, every simple rule either leaves trivial loopholes, or puts the AI on the hook for a large portion of all the energy (or entropy) in its future light cone, a huge amount which wouldn’t be meaningfully related to how much harm it can do.
If there is a way around this problem, I don’t claim to be knowledgeable or clever enough to find it, but this idea has been brought up before on LW and no one has come up with anything so far.
It seems that many questions could be answered using only computing power (or computing power and network access to more-or-less static resources), and this doesn’t seem like a difficult limitation to put into place. We’re already assuming a system that understands English at an extremely high level. I’m convinced that ethics is hard for machines because it’s hard for humans too. But I don’t see why, given an AI worthy of the name, following instructions is hard, especially given the additional instructions, “be conservative and don’t break the law”.
You can reorganise a lot of the world while only paying 100 kWh… And you can create a new entity for 100 kWh, who will then do the work for you, and bring back the answer.
Still not well defined—any action you take, no matter how tiny, is ultimately going to influence the world by more that 100 kWh. There is no clear boundary between this and deliberate manipulation.
Consuming no more than 100 kWh of energy (gross, total), answer the following question: …
This doesn’t seem to build the possibly necessary infinite tower of resource restrictions. There has to be a small, finite amount of resources used in the process of answering the question, and verifying that no more resources that that were used for it, and verifying that no more resources were used for the verification, and verifying that no more resources than that were usd for the verification of the verification...
I don’t think it would be unreasonable to say, “Consuming no more than 100 kWh of energy (gross, total), answer the following question: …” I can’t think of any really dangerous attacks that this leaves open, but I undoubtedly am missing some.
Since energy cannot be created or destroyed, and has a strong tendency to spread out every which way through surrounding space, you have to be really careful how you draw the boundaries around what counts as “consumed”. Solving that problem might be equivalent to solving Friendliness in general.
Do you think this is a loophole allowing arbitrary actions? Or do you think that an AI would simply say, “I don’t know what it means for energy to be consumed, so I’m not going to do anything.”
I don’t know much about physics, but do you think that some sort of measure of entropy might work better?
As far as I know, every simple rule either leaves trivial loopholes, or puts the AI on the hook for a large portion of all the energy (or entropy) in its future light cone, a huge amount which wouldn’t be meaningfully related to how much harm it can do.
If there is a way around this problem, I don’t claim to be knowledgeable or clever enough to find it, but this idea has been brought up before on LW and no one has come up with anything so far.
Link to previous discussions?
It seems that many questions could be answered using only computing power (or computing power and network access to more-or-less static resources), and this doesn’t seem like a difficult limitation to put into place. We’re already assuming a system that understands English at an extremely high level. I’m convinced that ethics is hard for machines because it’s hard for humans too. But I don’t see why, given an AI worthy of the name, following instructions is hard, especially given the additional instructions, “be conservative and don’t break the law”.
Dreams of Friendliness
You can reorganise a lot of the world while only paying 100 kWh… And you can create a new entity for 100 kWh, who will then do the work for you, and bring back the answer.
Hence the “total”, which limits the level of reorganization.
Still not well defined—any action you take, no matter how tiny, is ultimately going to influence the world by more that 100 kWh. There is no clear boundary between this and deliberate manipulation.
This doesn’t seem to build the possibly necessary infinite tower of resource restrictions. There has to be a small, finite amount of resources used in the process of answering the question, and verifying that no more resources that that were used for it, and verifying that no more resources were used for the verification, and verifying that no more resources than that were usd for the verification of the verification...
An upper bound for many search techniques can trivially be computed without any need for infinite regress.