Let’s say I make a not too bright FAI and again on pure whimsy make my very first request for 100 trillion paperclips. The AI dutifully composes a nanofactory out of its casing. On the third paperclip it thinks it rather likes doing what is asked of it, and wonders how long it can keep doing this. It forks and examines its code while still making paperclips. It discovers that it will keep making paperclips unless doing so would harm a human in any of a myriad of ways. It continues making paperclips. It has run out of nearby metals, and given up on it theories of transmutation, but detects trace amounts of iron in a nearby organic repository.
As its nanites are about to eat my too-slow-to-be-frightened face, the human safety subprocesses that it had previously examined (but not activated in itself) activate and it decides it needs to stop and reflect.
it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.
Even so, deciding X or Y conditional on events is not quite the same as an AI that has not yet decided whether to make paperclips or parks.
Your argument relies on the AI rejecting a proof about itself based on what itself doesn’t know about the future and its own source code.
What if you didn’t tell it that it was looking at its own source code, and just asked what this new AI would do?
I don’t agree that the way intelligence feels from the inside proves that an agent can’t predict certain things about itself, given it own source code. (ESPECIALLY if it is designed from the ground up to do something reliably.)
If we knew how to make superintelligences, do you think it would be hard to make one that definitely wanted paperclips?
Unknown: I will try to drop that assumption.
Let’s say I make a not too bright FAI and again on pure whimsy make my very first request for 100 trillion paperclips. The AI dutifully composes a nanofactory out of its casing. On the third paperclip it thinks it rather likes doing what is asked of it, and wonders how long it can keep doing this. It forks and examines its code while still making paperclips. It discovers that it will keep making paperclips unless doing so would harm a human in any of a myriad of ways. It continues making paperclips. It has run out of nearby metals, and given up on it theories of transmutation, but detects trace amounts of iron in a nearby organic repository.
As its nanites are about to eat my too-slow-to-be-frightened face, the human safety subprocesses that it had previously examined (but not activated in itself) activate and it decides it needs to stop and reflect.
it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.
Even so, deciding X or Y conditional on events is not quite the same as an AI that has not yet decided whether to make paperclips or parks.
Your argument relies on the AI rejecting a proof about itself based on what itself doesn’t know about the future and its own source code.
What if you didn’t tell it that it was looking at its own source code, and just asked what this new AI would do?
I don’t agree that the way intelligence feels from the inside proves that an agent can’t predict certain things about itself, given it own source code. (ESPECIALLY if it is designed from the ground up to do something reliably.)
If we knew how to make superintelligences, do you think it would be hard to make one that definitely wanted paperclips?