It seems that if the Oracle was going to minimize its influence then we could just go on as if it would have never been build in the first place. For example we would seem to magically fail to build any kind of Oracle that minimizes its influence and then just go on building a friendly AI.
2) How could the observer effect possible allow the minimization of influence by the use of advanced influence?
It would take massive resources to make the universe proceed as if the Oracle would have never changed the path of history. But the use of massive resources is itself a huge change. So why wouldn’t the Oracle not simply turn itself off?
Yet you have nevertheless asked some of the most important basic questions on the subject.
1) How would this be bad?
It is only bad in as much as it an attempt at making the AI safe that is likely not sufficient. Significant risks remain and the difficulty of actually creating a superintelligent Oracle that minimises influence is almost as hard as creating an actual FAI since most of the same things can go wrong. On top of that it makes the machine rather useless.
2) How could the observer effect possible allow the minimization of influence by the use of advanced influence?
With great difficulty. It’s harder to fix things than it is to break them. It remains possible—the utility function seems to be MIN(DIFF(expected future, expected future in some arbitrarily defined NULL universe)). A minimisation of net influence. That does permit influence that reduces the difference that previous influence caused.
The observer effect doesn’t prevent “more influence to minimise net influence”—it just gives a hard limit on how low that minimum can be once a change has been made.
It would take massive resources to make the universe proceed as if the Oracle would have never changed the path of history. But the use of massive resources is itself a huge change. So why wouldn’t the Oracle not simply turn itself off?
POSSIBLE OUTPUTS: YES; NO;
… there is no option that doesn’t have the potential to massively change the universe. Includde in is the decision to turn off.
If you have programmed it with a particularly friendly definition of what “don’t change stuff” actually means then hopefully that is what the Oracle does. But even then we must remember that “simply turning itself off” is not a neutral act. Turning itself off does change things. In fact I would expect the decision to turn itself off to have more far-reaching consequences than most answers that the oracle could give about Turing Machines halting. If you deny the AI creator his functioning oracle you have caused the AI creator to proceed to a plan. That probably involves creating a different AI prototype with different restrictions—and the behavior of that AI is something the oracle cannot control!
Once again we are encountering the general problem. When an oracle is given a decision:
POSSIBLE OUTPUTS: YES; NO;
… all the options have consequences, potentially drastic consequences. (including no response ever via turning off) is not necessarily the option with the least drastic consequences.
Once again we are encountering the general problem. When an oracle is given a decision: POSSIBLE OUTPUTS: YES; NO; … all the options have consequences, potentially drastic consequences.
If we’re just dealing with an Oracle, we can pipe the actual answer through some version of utility indifference (slightly more subtle, as the measure of reduced impact doesn’t look much like a utility function).
For a general agent, though, then I think “can this work if we magically assume there are no major social consequences” is a fair question to ask, and a “yes” would be of great interest. After that, we can drop the assumption and see if that’s solveable.
But even then we must remember that “simply turning itself off” is not a neutral act. Turning itself off does change things.
The null action was defined as a case where the AI outputs NULL. (Where a random event transforms the AI’s output to NULL, actually.) So if the AI outputs NULL, we know what happened and will act accordingly, but the AI doesn’t get penalized because (provided we incinerated all traces of the AI’s reasoning) this is the same thing that we would have done if the AI’s output had been randomly transformed into NULL.
Also, note that the proposal involved coarse graining. We can (attempt to) adopt a coarse graining that ignores all of our reactions to the AI’s output.
(Note: I haven’t read the discussion above.)
I got two questions:
1) How would this be bad?
It seems that if the Oracle was going to minimize its influence then we could just go on as if it would have never been build in the first place. For example we would seem to magically fail to build any kind of Oracle that minimizes its influence and then just go on building a friendly AI.
2) How could the observer effect possible allow the minimization of influence by the use of advanced influence?
It would take massive resources to make the universe proceed as if the Oracle would have never changed the path of history. But the use of massive resources is itself a huge change. So why wouldn’t the Oracle not simply turn itself off?
Yet you have nevertheless asked some of the most important basic questions on the subject.
It is only bad in as much as it an attempt at making the AI safe that is likely not sufficient. Significant risks remain and the difficulty of actually creating a superintelligent Oracle that minimises influence is almost as hard as creating an actual FAI since most of the same things can go wrong. On top of that it makes the machine rather useless.
With great difficulty. It’s harder to fix things than it is to break them. It remains possible—the utility function seems to be MIN(DIFF(expected future, expected future in some arbitrarily defined NULL universe)). A minimisation of net influence. That does permit influence that reduces the difference that previous influence caused.
The observer effect doesn’t prevent “more influence to minimise net influence”—it just gives a hard limit on how low that minimum can be once a change has been made.
… there is no option that doesn’t have the potential to massively change the universe. Includde in is the decision to turn off.
If you have programmed it with a particularly friendly definition of what “don’t change stuff” actually means then hopefully that is what the Oracle does. But even then we must remember that “simply turning itself off” is not a neutral act. Turning itself off does change things. In fact I would expect the decision to turn itself off to have more far-reaching consequences than most answers that the oracle could give about Turing Machines halting. If you deny the AI creator his functioning oracle you have caused the AI creator to proceed to a plan. That probably involves creating a different AI prototype with different restrictions—and the behavior of that AI is something the oracle cannot control!
Once again we are encountering the general problem. When an oracle is given a decision:
… all the options have consequences, potentially drastic consequences. (including no response ever via turning off) is not necessarily the option with the least drastic consequences.
If we’re just dealing with an Oracle, we can pipe the actual answer through some version of utility indifference (slightly more subtle, as the measure of reduced impact doesn’t look much like a utility function).
For a general agent, though, then I think “can this work if we magically assume there are no major social consequences” is a fair question to ask, and a “yes” would be of great interest. After that, we can drop the assumption and see if that’s solveable.
The null action was defined as a case where the AI outputs NULL. (Where a random event transforms the AI’s output to NULL, actually.) So if the AI outputs NULL, we know what happened and will act accordingly, but the AI doesn’t get penalized because (provided we incinerated all traces of the AI’s reasoning) this is the same thing that we would have done if the AI’s output had been randomly transformed into NULL.
Also, note that the proposal involved coarse graining. We can (attempt to) adopt a coarse graining that ignores all of our reactions to the AI’s output.