The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.
The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal.
Suppose some strange alien came to Earth and credibly offered to pay us one million dollars’ worth of new wealth every time we created a paperclip. We’d encounter no special intellectual difficulty in figuring out how to make lots of paperclips.
That is, minds would readily be able to reason about:
How many paperclips would result, if I pursued a policy null ?
How can I search out a policy null that happens to have a high answer to the above question?
The Orthogonality Thesis asserts that since these questions are not computationally intractable, it’s possible to have an agent that tries to make paperclips without being paid, because paperclips are what it wants. The strong form of the Orthogonality Thesis says that there need be nothing especially complicated or twisted about such an agent.
The Orthogonality Thesis is a statement about computer science, an assertion about the logical design space of possible cognitive agents. Orthogonality says nothing about whether a human AI researcher on Earth would want to build an AI that made paperclips, or conversely, want to make a nice AI. The Orthogonality Thesis just asserts that the space of possible designs contains AIs that make paperclips. And also AIs that are nice, to the extent there’s a sense of “nice” where you could say how to be nice to someone if you were paid a billion dollars to do that, and to the extent you could name something physically achievable to do.
This contrasts to inevitablist theses which might assert, for example:
“It doesn’t matter what kind of AI you build, it will turn out to only pursue its own survival as a final end.”
“Even if you tried to make an AI optimize for paperclips, it would reflect on those goals, reject them as being stupid, and embrace a goal of valuing all sapient life.”
The reason to talk about Orthogonality is that it’s a key premise in two highly important policy-relevant propositions:
It is possible to build a nice AI.
It is possible to screw up when trying to build a nice AI, and if you do, the AI will not automatically decide to be nice instead.
Orthogonality does not require that all agent designs be equally compatible with all goals. E.g., the agent architecture AIXI-tl can only be formulated to care about direct functions of its sensory data, like a reward signal; it would not be easy to rejigger the AIXI architecture to care about creating massive diamonds in the environment (let alone any more complicated environmental goals). The Orthogonality Thesis states “there exists at least one possible agent such that…” over the whole design space; it’s not meant to be true of every particular agent architecture and every way of constructing agents.
Orthogonality is meant as a descriptive statement about reality, not a normative assertion. Orthogonality is not a claim about the way things ought to be; nor a claim that moral relativism is true (e.g. that all moralities are on equally uncertain footing according to some higher metamorality that judges all moralities as equally devoid of what would objectively constitute a justification). Claiming that paperclip maximizers can be constructed as cognitive agents is not meant to say anything favorable about paperclips, nor anything derogatory about sapient life.
The thesis was originally defined by Nick Bostrom in the paper “Superintelligent Will”, (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.
(Most of the above copied from the Arbital orthogonality thesis article, continue reading there)
Related: Complexity of Value, Decision Theory, General Intelligence, Utility Functions
See Also
External links
Definition of the orthogonality thesis from Bostrom’s Superintelligent Will
Critique of the thesis by John Danaher
Superintelligent Will paper by Nick Bostrom
From the old discussion page:
Talk:Orthogonality thesis
Quality Concerns
There is no reference for Stuart Armstrong’s requirements. I guess they come from here.
I know roughly what the orthogonality thesis is. But if I’d only read the wiki page, it wouldn’t make sense to me. – »Well, we don’t share the opinion of some people that the goals of increasingly intelligent agents converge. So we put up a thesis which claims that intelligence and goals vary freely. Suppose there’s one goal system that fulfils Armstrong’s requirements. This would refute the orthogonality thesis, even if most intelligences converged on one or two other goal systems.« I don’t mean to say that the orthogonality thesis itself doesn’t make sense. I mean that the wiki page doesn’t provide enough information to enable people to understand that it makes sense.
Instead of repairing the above shortcomings, I propose referring people to the corresponding Arbital page. --Rmoehn (talk) 14:47, 31 May 2016 (AEST)