Oh yes they are. One can leave you penniless and other scarred for life. If you’re doing them very wrong, of course. Same with thinking about acausal trade.
gRR
So is mountain skiing, starting new companies, learning chemistry, and entering into relashionships.
only decision theorists whose work is plugging into FAI theory need to think about timeless trade
But it’s fun! Why only a select group of people is to be allowed to have it?
Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things:
(1) There cannot exist any efficient real-world algorithm with goal G.
(2) If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail.
(3) If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail.
(4) If a high-resource human society were highly motivated to achieve the goals of G, then it could not do so (here the human society is seen as the algorithm).
(5) Same as above, for any hypothetical alien societies.
(6) There cannot exist any pattern of reinforcement learning that would train a highly efficient real-world intelligence to follow the goal G.
(7) There cannot exist any evolutionary or environmental pressures that would evolving highly efficient real world intelligences to follow goal G.
All of these seem extraordinarily strong claims to make!When I try arguing for anti-orthogonality, all those different conditions on G do not appear to add strength. The claim of anti-orthogonality is, after all, that most goal systems are inconsistent in the same sense in which “I want to be stupider” or “I want to prove Goedel’s statement...” goals are inconsistent, even though the inconsistency is not immediately apparent. And then all of the conditions immediately follow.
The “human society” conditions (3) and (4) are supposed to argue in favor of there being no impossible G-s, but in fact they argue for the opposite. Because, obviously, there are only very few G-s, which would be acceptable as long-term goals for human societies.
This point also highlights another important difference: the anti-orthogonality thesis can be weaker than “there cannot exist any efficient real-world algorithm with goal G”. Instead, it can be “any efficient real-world algorithm with goal G is value-unstable”, meaning that if any value drift, however small, is allowed, then the system will in short time drift away from G to the “right goal system”. This would distinguish between the “strong anti-orthogonality” (1), (2), (3) on the one hand, and “weak anti-orthogonality” (4), (5), (6), (7) on the other.
This weaker anti-orthogonality thesis is sufficient for practical purposes. It basically asserts that an UFAI could only be created via explicit and deliberate attempts to create an UFAI, and not because of bugs, insufficient knowledge, etc. And this makes the whole “Orthogonality for superhuman AIs” section much less relevant.
I don’t understand what you mean by “running a decision theory for one step”. Assume you give the system a problem (in the form of a utility function to maximize or a goal to achieve), and ask to find the best next action to take. This makes the system an Agent with the goal of finding the best next action (subject to all additional constraints you may specify, like maximal computation time, etc). If the system is really intelligent, and the problem (of finding the best next action) is hard so the system needs more resources, and there is any hole anywhere in the box, then the system will get out.
Regarding self-modification, I don’t think it is relevant to the safety issues by itself. It is only important in that using it, the system may become very intelligent very fast. The danger is intelligence, not self-modification. Also, a sufficiently intelligent program may be able to create and run a new program without your knowledge or consent, either by simulating an interpreter (slow, but if the new program makes exponential time-saving, this would still make a huge impact), or by finding and exploiting bugs in its own programming.
I’m not sure how do you distinguish the tool AGI from either a narrow AI or an agent with a utility function for answering questions (=an Oracle)?
If it solves problems of a specific kind (e.g., Input: GoogleMap, point A, point B, “quickest path” utility function; Output: the route) by following well-understood algorithms for solving these kinds of problems, then it’s obviously a narrow AI.
If it solves general problems, then its algorithm must have the goal to find the action that maximizes the input utility, and then it’s an Oracle.
Tool-AGI seems to be an an incoherent concept. If a Tool simply solves a given set of problems in the prespecified allowed ways (only solves GoogleMap problems, takes its existing data set as fixed, and has some pre-determined, safe set of simple actions it can take), then it’s a narrow AI.
Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.”
If an AGI is able to understand which of the intermediate outcomes an action may cause are important for people, and to summarize this information in a user-friendly manner, then building such AGI is FAI-complete.
math; programming; computer user skills in general; any money-making skills (indirectly)
I’m not sure exactly what point you wish to illustrate with the Chaitin’s omega example. Yes, its value depends on the TM coding. But when a specific one is chosen, the value is unique.
Cool, thanks!
There is nothing circular about the definition—merely recursive.
Recursive definitions must bottom out at some point. The ones that do not are called circular.
As soon as you observe two things to directly interact with one another, you may safely asssert that both exist under my definition.
You didn’t say so before. Now, we two are interacting now (I hope), so we do exist, after all? And what about the characters in the virtual world of a computer game I mentioned before? I certainly saw them interacting.
This is, frankly, not very complicated to figure out.
So sorry for my stupidity.
I wrote it for novelty value, although it seems to be a defensible position. I can think of counterarguments, and counter-counterarguments, etc. Of course, if you are not interested and/or don’t have time, you shouldn’t argue about it.
Thanks for the “spoons” link, a great metaphor there.
Hmm. Under your definition, “to exist, a thing must directly interact in some fashion with other things which exist”. For this to be non-circular, you must specify at least one thing that is known to exist. I thought, this one certainly-known-to-exist thing is myself. If you say that under your definition I don’t exist, then what can be known to exist and how can it be known to do so?
This is a philosophical mire. Do pebbles actually exist? But they are composed from quarks, electrons, etc, and these are in principle indistinguishable from one another, so a pebble is only defined by relations between them, doesn’t it make the pebble only ‘real’?
On the other hand, when I play a computer game, do the various objects in the virtual world exist? Presumably, yes, because they interact with me in some fashion, and I exist (I think...). What if I write a program to play for me and stop watching the monitor. Do they stop existing?
Actually I think grounding morality can be performed on a god-as-a-mathematical-like-entity if you wanted to
For this, there’d have to be a well-defined God, provably unique up to isomorphism.
Would restricting the axiom schema to content-less proposition symbols like “B” solve the problem?
Thanks, I understand now. The implications:
Prf(B) ⇒ A=1
Prf(~B) ⇒ A=2mislead me. I thought they were another set of on-the-fly axioms, but if they are decision rules, this means something like:
A(): generate various proofs if found a proof of B return 1 if found a proof of ~B return 2
And then there’s no Löbean issues. Cool! This agent can prove A=Ai without any problems. This should work great for ASP.
No, I mean, the agent actually performs the action when it proved it, right? So Prf(A=A1) implies that A will indeed be =A1. Assuming the agent has its own source, it will know that.
I think it goes something like this: (1) God created morality and cow-breeding, and (2) put the knowledge of it into humans [or, alternatively, the knowledge of morality was the result of eating the apple and knowing Good and Evil, I’m not sure], and (3) one of the important points of morality is that humans should have free will, and so (since God is moral) they do, and thus (4) they are free to practise immorality and cow-genocide.
Assume there is an Agent AI that has a goal of solving math problems. It gets as input a set of axioms and a target statement, and wants to output a proof or disproof of the statement (or maybe a proof of undecidability) as fast as possible. It runs in some idealized computing environment and knows its own source code. It also has access to a similar idealized virtual computing environment where it can design and run any programs. After solving a problem, it is restored to its initial state.
Then:
(1) It has (apparently) sufficient ingredients for FOOM-ing: complex problem to solve and self-modification.
(2) It is safe, because its outside-world-related knowledge is limited to a set of axioms, a target statement, a description of an ideal computing environment, and its own source code. Even AIXI would not be able to usefully extrapolate the real world from that—there would be lots of wildly different equiprobable worlds, where these things would exist. And since the system is restored to the initial state after each run, there is no possibility of its collecting and gathering more knowledge in between runs.
(3) It does not require solutions to metaethics, or symbol grounding. The problem statement and utility function are well-defined and can be stated precisely, right now. All it needs to work is understanding of intelligence.
This would be a provably safe “Tool AGI”: Math Oracle. It is an obvious thing, but I don’t see it discussed, not sure why. Was it already dismissed for some reasons?