Donald Hobson comments on Questions about ″formalizing instrumental goals”

Donald Hobson 5 Apr 2022 13:35 UTC
0 points
It is a fragile machine, much more delicate than any animal, with some knowledge of itself, and a rough map of some parts of the external world.
This may well be true at boot up. Once the AI has existed for a bit.
It is a very robust machine, built out of the most resilient components the AI can design and far tougher than any animal. It has a very detailed map of almost all of the external world.
The AGI requires continuous support from the world outside of it, which it can’t fully see or understand.
The AI requires negentropy, and thats about it.
Any one of its many pieces might break, at any time. Unlike an animal, it is unlikely to be made of self-repairing, self-sustaining molecular machines, each of which can independently heal themselves, arranged in mutually cooperative networks. This gives rise to the primary risk to the AGI: that it would break down.
The AI is made of nanomachines far better designed than any biology. It has multiply redundant everything and very good self repair mechanisms. It estimates the chance of a breakdown as somewhere around 10^-40. And can lower that further with yet more defences.
But how can it replace the broken CPU without help from operators in the datacenter that houses it? How can it know the health of the manufacturer of that CPU? What will happen if the global economy breaks down, international trade trends revert to isolationism, and the particular kind of CPU that the AGI needs is no longer available for purchase?
Relying on human produced chips is a weakness the AI will have at first but will quickly remedy.
If the map of the AGI is incorrect in this regard, the AGI might inadvertently destroy some part of itself, believing it was consuming resources it was free to use.
The AI is not a total idiot.
But how does the AGI understand that power plant?
At least as well as you do. It recognises that it needs electricity. And so won’t destroy existing powerplants until it has made a better source of energy. What, you think a superintelligence is just going to sit there watching an old human built coal powerstation being maintained by humans. It won’t aim for a robotically maintained fusion reactor?
Supporting the agents in its dependency chain is therefore likely to be a convergent instrumental subgoal unconsidered by the original paper.
Having other agents in your dependency chain is generally a bad thing. At the very least, humans aren’t as efficient workers as advanced robots.
It creates a new life form which might be a competitor of the first!
So now you are arguing that the paperclip maximizer AI will cooperate with humans, and help them, despite having totally different goals. Yet will be so scared of an exact copy of itself it will refuse to self duplicate.
You are arguing both “Other agents are so helpful that even agents with totally different utility functions will help the AI”.
And also “Other agents are so dangerous that even an exact copy of itself is too big a risk.”
Each one then concludes that the other one has been damaged and is behaving incorrectly. How can the two of them possibly resolve this situation?
3 AI’s that take a majority vote? A complicated cryptographic consensus protocol?
In other words, in order to avoid killing itself, and simply in order to avoid chances that it will be destroyed by itself or its environment, is to instrumentally support all complex agents around it.
The AI can have a very good idea of exactly which agents it wants to support, and humans won’t be on that list.
Ah the old Corporations are already superintelligences therefore ? argument. Like someone interrupting a discussion on asteroid risks by pointing out that landslides are already falling rocks that cause damage. The existence of corporations doesn’t stop AI also being a thing. There are some rough analogies, but also a lot of differences.
This post seems to be arguing against an approximation in which AI are omniscient, by instead imagining that the AI are stupid, and getting even less accurate results.
What if it is the case that no agent can ever be fully aligned? What if any utility function, stamped out on the world enough times, will kill all other life?
Directly contradicts previous claims. Probably not true.
Perhaps a singular focus on giving a single agent unlimited license to pursue some objective obscures the safest path forward: ensure a panoply of different agents exist.
This doesn’t seem obviously safer. If you have a paperclip maximizer and a staple maximizer, humans may end up caught in the crossfire between 2 ASI.
how would we do THAT without a global dictatorship?
Superintelligent AI, with a goal of stopping other AI’s.
or, if the thesis of this paper is correctly, simply waiting for large unaligned agents to accidentally kill themselves.
Once again assuming the AI is an idiot. Saying the AI has to be somewhat careful to avoid accidentally killing itself, fair enough. Saying it has a 0.1% chance of killing itself anyway. Not totally stupid. But this “strategy” only works if ASI reliably accidentally kill themselves. The 20th ASI is created, sees the wreckage of all the previous ASI, and still inevitably kills themselves. Pull the other one, it has bells on.