Submission for a counterfactual oracle: precommit that, if the oracle stays silent, a week from now you’ll try to write the most useful message to your past self, based on what happens in the world during that week. Ask the oracle to predict that message. This is similar to existing solutions, but slightly more meta, because the content of the message is up to your future self—it could be lottery numbers, science papers, disaster locations, or anything else that fits within the oracle’s size limit. (If there’s no size limit, just send the whole internet.)
You could also form a bucket brigade to relay messages from further ahead, but that’s a bad idea. If the oracle’s continued silence eventually leads to an unfriendly AI, it can manipulate the past by hijacking your chain of messages and thus make itself much more likely. The same is true for all high-bandwidth counterfactual oracles—they aren’t unfriendly in themselves, but using them creates a thicket of “retrocausal” links that can be exploited by any potential future UFAI. The more UFAI risk grows, the less you should use oracles.
This is similar to existing solutions, but slightly more meta
I feel like this is about equally meta as my “Superintelligent Agent” submission, since my committee could output “Show the following message to the operator: …” and your message could say “I suggest that you perform the following action: …”, so the only difference between your idea and mine is that in my submission the output of the Oracle is directly coupled to some effectors to let the agent act faster, and yours has a (real) human in the loop.
The more UFAI risk grows, the less you should use oracles.
Hmm, good point. I guess Chris Leong made a similar point, but it didn’t sink in until now how general the concern is. This seems to affect Paul’s counterfactual oversight idea as well, and maybe other kinds of human imitations and predictors/oracles, as well as things that are built using these components like quantilizers and IDA.
Thinking about this some more, all high-bandwidth oracles (counterfactual or not) risk receiving messages crafted by future UFAI to take over the present. If the ranges of oracles overlap in time, such messages can colonize their way backwards from decades ahead. It’s especially bad if humanity’s FAI project depends on oracles—that increases the chance of UFAI in the world where oracles are silent, which is where the predictions come from.
One possible precaution is to use only short-range oracles, and never use an oracle while still in prediction range of any other oracle. But that has drawbacks: 1) it requires worldwide coordination, 2) it only protects the past. The safety of the present depends on whether you’ll follow the precaution in the future. And people will be tempted to bend it, use longer or overlapping ranges to get more power.
In short, if humanity starts using high-bandwidth oracles, that will likely increase the chance of UFAI and hasten it. So such oracles are dangerous and shouldn’t be used. Sorry, Stuart :-)
Thinking about this some more, all high-bandwidth oracles (counterfactual or not) risk receiving messages crafted by future UFAI to take over the present.
Note that in the case of counterfactual oracle, this depends on UFAI “correctly” solving counterfactual mugging (i.e., the UFAI has to decide to pay some cost in its own world to take over a counterfactual world where the erasure event didn’t occur).
So such oracles are dangerous and shouldn’t be used.
This seems too categorical. Depending on the probabilities of various conditions, using such oracles might still be the best option in some circumstances.
Sure, in case of erasure you can decide to use oracles less, and compensate your clients with money you got from “erasure insurance” (since that’s a low probability event). But that doesn’t seem to solve the problem I’m talking about—UFAI arising naturally in erasure-worlds and spreading to non-erasure-worlds through oracles.
The problem you were talking about seemed to rely on bucket brigades. I agree that UFAIs jumping back a single step is a fair concern. (Though I guess you could counterfactually have enough power to halt AGI research completely...) I’m trying to address it elsethread. :)
Ah, sorry, you’re right. To prevent bucket brigades, it’s enough to stop using oracles for N days whenever an N-day oracle has an erasure event, and the money from “erasure insurance” can help with that. When there are no erasure events, we can use oracles as often as we want. That’s a big improvement, thanks!
Yeah. And low-bandwidth oracles can have a milder version of the same problem. Consider your “consequentialist” idea: if UFAI is about to arise, and one of the offered courses of action leads to UFAI getting stopped, then the oracle will recommend against that course of action (and for some other course where UFAI wins and maxes out the oracle’s reward).
Submission for a counterfactual oracle: precommit that, if the oracle stays silent, a week from now you’ll try to write the most useful message to your past self, based on what happens in the world during that week. Ask the oracle to predict that message. This is similar to existing solutions, but slightly more meta, because the content of the message is up to your future self—it could be lottery numbers, science papers, disaster locations, or anything else that fits within the oracle’s size limit. (If there’s no size limit, just send the whole internet.)
You could also form a bucket brigade to relay messages from further ahead, but that’s a bad idea. If the oracle’s continued silence eventually leads to an unfriendly AI, it can manipulate the past by hijacking your chain of messages and thus make itself much more likely. The same is true for all high-bandwidth counterfactual oracles—they aren’t unfriendly in themselves, but using them creates a thicket of “retrocausal” links that can be exploited by any potential future UFAI. The more UFAI risk grows, the less you should use oracles.
I feel like this is about equally meta as my “Superintelligent Agent” submission, since my committee could output “Show the following message to the operator: …” and your message could say “I suggest that you perform the following action: …”, so the only difference between your idea and mine is that in my submission the output of the Oracle is directly coupled to some effectors to let the agent act faster, and yours has a (real) human in the loop.
Hmm, good point. I guess Chris Leong made a similar point, but it didn’t sink in until now how general the concern is. This seems to affect Paul’s counterfactual oversight idea as well, and maybe other kinds of human imitations and predictors/oracles, as well as things that are built using these components like quantilizers and IDA.
Thinking about this some more, all high-bandwidth oracles (counterfactual or not) risk receiving messages crafted by future UFAI to take over the present. If the ranges of oracles overlap in time, such messages can colonize their way backwards from decades ahead. It’s especially bad if humanity’s FAI project depends on oracles—that increases the chance of UFAI in the world where oracles are silent, which is where the predictions come from.
One possible precaution is to use only short-range oracles, and never use an oracle while still in prediction range of any other oracle. But that has drawbacks: 1) it requires worldwide coordination, 2) it only protects the past. The safety of the present depends on whether you’ll follow the precaution in the future. And people will be tempted to bend it, use longer or overlapping ranges to get more power.
In short, if humanity starts using high-bandwidth oracles, that will likely increase the chance of UFAI and hasten it. So such oracles are dangerous and shouldn’t be used. Sorry, Stuart :-)
Note that in the case of counterfactual oracle, this depends on UFAI “correctly” solving counterfactual mugging (i.e., the UFAI has to decide to pay some cost in its own world to take over a counterfactual world where the erasure event didn’t occur).
This seems too categorical. Depending on the probabilities of various conditions, using such oracles might still be the best option in some circumstances.
Some thoughts on that idea: https://www.lesswrong.com/posts/6WbLRLdmTL4JxxvCq/analysing-dangerous-messages-from-future-ufai-via-oracles
Yeah, agreed on both points.
Some thoughts on this idea, thanks for it: https://www.lesswrong.com/posts/6WbLRLdmTL4JxxvCq/analysing-dangerous-messages-from-future-ufai-via-oracles
Very worthwhile concern, and I will think about it more.
In case of erasure, you should be able to get enough power to prevent another UFAI summoning session.
Sure, in case of erasure you can decide to use oracles less, and compensate your clients with money you got from “erasure insurance” (since that’s a low probability event). But that doesn’t seem to solve the problem I’m talking about—UFAI arising naturally in erasure-worlds and spreading to non-erasure-worlds through oracles.
The problem you were talking about seemed to rely on bucket brigades. I agree that UFAIs jumping back a single step is a fair concern. (Though I guess you could counterfactually have enough power to halt AGI research completely...) I’m trying to address it elsethread. :)
Ah, sorry, you’re right. To prevent bucket brigades, it’s enough to stop using oracles for N days whenever an N-day oracle has an erasure event, and the money from “erasure insurance” can help with that. When there are no erasure events, we can use oracles as often as we want. That’s a big improvement, thanks!
Good idea.
Yeah. And low-bandwidth oracles can have a milder version of the same problem. Consider your “consequentialist” idea: if UFAI is about to arise, and one of the offered courses of action leads to UFAI getting stopped, then the oracle will recommend against that course of action (and for some other course where UFAI wins and maxes out the oracle’s reward).