A minor qualm that does not impact your main point. From this quotation of Bostrom:
We can tentatively define a superintelligence as any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest
You deduce:
So, the singularity claim assumes a notion of intelligence like the human one, just ‘more’ of it.
That’s too narrow of an interpretation. The definition by Bostrom only states that the superintelligence outperforms humans on all intellectual tasks. But its inner workings could be totally different from human reasoning.
Others here will be able to discuss your main point better than me (edit: but I’ll have a go at it as a personal challenge). I think the central point is one you mention in passing, the difference between instrumental goals and terminal values. An agent’s terminal values should be able to be expressed as a utility function, otherwise these values are incoherent and open to dutch-booking. We humans are incoherent, which is why we often confuse instrumental goals for terminal values, and we need to force ourselves to think rationally otherwise we’re vulnerable to dutch-booking. The utility function is absolute: if an agent’s utility function is to maximize the number of paperclips, no reasoning about ethics will make them value some instrumental goal over it. I’m not sure whether the agent is totally protected against wireheading though (convincing itself it’s fullfilling its values rather than actually doing it).
It’d be nice if we could implement our values as the agent’s terminal values. But that turns out to be immensely difficult (look for articles with ‘genie’ here). Forget the 3 laws of Asimov: the first law alone is irredeemably ambiguous. How far should the agent go to protect human lives? What counts as a human? It might turn out more convenient for the agent to turn mankind into brains in a jar and store them eternally in a bunker for maximum safety.
A minor qualm that does not impact your main point. From this quotation of Bostrom:
You deduce:
That’s too narrow of an interpretation. The definition by Bostrom only states that the superintelligence outperforms humans on all intellectual tasks. But its inner workings could be totally different from human reasoning.
Others here will be able to discuss your main point better than me (edit: but I’ll have a go at it as a personal challenge). I think the central point is one you mention in passing, the difference between instrumental goals and terminal values. An agent’s terminal values should be able to be expressed as a utility function, otherwise these values are incoherent and open to dutch-booking. We humans are incoherent, which is why we often confuse instrumental goals for terminal values, and we need to force ourselves to think rationally otherwise we’re vulnerable to dutch-booking. The utility function is absolute: if an agent’s utility function is to maximize the number of paperclips, no reasoning about ethics will make them value some instrumental goal over it. I’m not sure whether the agent is totally protected against wireheading though (convincing itself it’s fullfilling its values rather than actually doing it).
It’d be nice if we could implement our values as the agent’s terminal values. But that turns out to be immensely difficult (look for articles with ‘genie’ here). Forget the 3 laws of Asimov: the first law alone is irredeemably ambiguous. How far should the agent go to protect human lives? What counts as a human? It might turn out more convenient for the agent to turn mankind into brains in a jar and store them eternally in a bunker for maximum safety.