Assume there is an Agent AI that has a goal of solving math problems. It gets as input a set of axioms and a target statement, and wants to output a proof or disproof of the statement (or maybe a proof of undecidability) as fast as possible. It runs in some idealized computing environment and knows its own source code. It also has access to a similar idealized virtual computing environment where it can design and run any programs. After solving a problem, it is restored to its initial state.
Then: (1) It has (apparently) sufficient ingredients for FOOM-ing: complex problem to solve and self-modification. (2) It is safe, because its outside-world-related knowledge is limited to a set of axioms, a target statement, a description of an ideal computing environment, and its own source code. Even AIXI would not be able to usefully extrapolate the real world from that—there would be lots of wildly different equiprobable worlds, where these things would exist. And since the system is restored to the initial state after each run, there is no possibility of its collecting and gathering more knowledge in between runs. (3) It does not require solutions to metaethics, or symbol grounding. The problem statement and utility function are well-defined and can be stated precisely, right now. All it needs to work is understanding of intelligence.
This would be a provably safe “Tool AGI”: Math Oracle. It is an obvious thing, but I don’t see it discussed, not sure why. Was it already dismissed for some reasons?
The utility of such systems is crucially constrained by the relevant outside-world-related knowledge you feed into them.
If you feed in only some simple general math axioms, then the system is limited to only discovering results in the domain of abstract mathematics. While useful, this isn’t going to change the world.
It only starts getting interesting when you seed it with some physics knowledge. AGI’s in sandboxes that have real physics but zero specific earth knowledge are still tremendously useful for solving all kinds of engineering and physics problems.
Eventually though if you give it enough specific knowledge about the earth and humans in particular, it could become potentially vary dangerous.
I still think that virtual sandboxes are the most likely profitable route to safety and haven’t been given enough serious consideration here on LW. At some point I’d like to retry that discussion, now that I understand LW etiquitte a little better.
Solving problems in abstract mathematics can be immensely useful even by itself, I think. Note: physics knowledge at low levels is indistinguishable from mathematics. But the main use of the system would be—safely studying the behavior of a (super-)intelligence, in preparation for a true FAI.
Solving problems in abstract mathematics can be immensely useful even by itself, I think.
Agreed. But the package of ideas entailed by AGI centers around systems that use human level reasoning, natural language understanding, and solve the set of AI-complete problems. The AI-complete problem set can be reduced to finding a compact generative model for natural language knowledge, which really is finding a compact generative model for the universe we observe.
Note: physics knowledge at low levels is indistinguishable from mathematics
Not quite. Abstract mathematics is too general. Useful “Physics knowledge” is a narrow set of mathematics that compactly describe the particular specific universe we observe. This specifity is both crucial and potentially dangerous.
But the main use of the system would be—safely studying the behavior of a (super-)intelligence, in preparation for a true FAI.
A super-intelligence (super-intelligent to us) will necessarily be AI-complete, and thus it must know of our universe. Any system that hopes to understand such a super-intelligence must likewise also know of our universe, simply because “super-intelligent” really means “having super-optimization power over this universe”.
By (super-)intelligence I mean EY’s definition, as a powerful general-purpose optimization process. It does not need to actually know about natural language or our universe to be AI-complete. A potential to learn them is sufficient. Abstract mathematics is arbitrarily complex, so sufficiently powerful optimization process in this domain will have to be sufficiently general for everything.
In theory we could all live inside an infinite turing simulation right now. In practise any super-intelligences in our universe will need to know of our universe to be super-relevant to our universe.
This sounds more like a tool AI! I thought that agent AIs generally had more persistent utility measures—this looks like the sort of thing where the AI has NO utility maximizing behavior until a problem is presented, then temporarily instantiates a problem-specific utility function (like the above).
Assume there is an Agent AI that has a goal of solving math problems. It gets as input a set of axioms and a target statement, and wants to output a proof or disproof of the statement (or maybe a proof of undecidability) as fast as possible. It runs in some idealized computing environment and knows its own source code. It also has access to a similar idealized virtual computing environment where it can design and run any programs. After solving a problem, it is restored to its initial state.
Then:
(1) It has (apparently) sufficient ingredients for FOOM-ing: complex problem to solve and self-modification.
(2) It is safe, because its outside-world-related knowledge is limited to a set of axioms, a target statement, a description of an ideal computing environment, and its own source code. Even AIXI would not be able to usefully extrapolate the real world from that—there would be lots of wildly different equiprobable worlds, where these things would exist. And since the system is restored to the initial state after each run, there is no possibility of its collecting and gathering more knowledge in between runs.
(3) It does not require solutions to metaethics, or symbol grounding. The problem statement and utility function are well-defined and can be stated precisely, right now. All it needs to work is understanding of intelligence.
This would be a provably safe “Tool AGI”: Math Oracle. It is an obvious thing, but I don’t see it discussed, not sure why. Was it already dismissed for some reasons?
The utility of such systems is crucially constrained by the relevant outside-world-related knowledge you feed into them.
If you feed in only some simple general math axioms, then the system is limited to only discovering results in the domain of abstract mathematics. While useful, this isn’t going to change the world.
It only starts getting interesting when you seed it with some physics knowledge. AGI’s in sandboxes that have real physics but zero specific earth knowledge are still tremendously useful for solving all kinds of engineering and physics problems.
Eventually though if you give it enough specific knowledge about the earth and humans in particular, it could become potentially vary dangerous.
The type of provably safe boxed AIs you are thinking of have been discussed before, specifically I proposed them here in one of my first main posts, which was also one of my lowest scoring posts.
I still think that virtual sandboxes are the most likely profitable route to safety and haven’t been given enough serious consideration here on LW. At some point I’d like to retry that discussion, now that I understand LW etiquitte a little better.
Solving problems in abstract mathematics can be immensely useful even by itself, I think. Note: physics knowledge at low levels is indistinguishable from mathematics. But the main use of the system would be—safely studying the behavior of a (super-)intelligence, in preparation for a true FAI.
Agreed. But the package of ideas entailed by AGI centers around systems that use human level reasoning, natural language understanding, and solve the set of AI-complete problems. The AI-complete problem set can be reduced to finding a compact generative model for natural language knowledge, which really is finding a compact generative model for the universe we observe.
Not quite. Abstract mathematics is too general. Useful “Physics knowledge” is a narrow set of mathematics that compactly describe the particular specific universe we observe. This specifity is both crucial and potentially dangerous.
A super-intelligence (super-intelligent to us) will necessarily be AI-complete, and thus it must know of our universe. Any system that hopes to understand such a super-intelligence must likewise also know of our universe, simply because “super-intelligent” really means “having super-optimization power over this universe”.
By (super-)intelligence I mean EY’s definition, as a powerful general-purpose optimization process. It does not need to actually know about natural language or our universe to be AI-complete. A potential to learn them is sufficient. Abstract mathematics is arbitrarily complex, so sufficiently powerful optimization process in this domain will have to be sufficiently general for everything.
In theory we could all live inside an infinite turing simulation right now. In practise any super-intelligences in our universe will need to know of our universe to be super-relevant to our universe.
What’s the utility function? I can imagine that resulting in several problems.
For example, U = 1/T, where T is the time (measured in virtual computing environment cycles) until the correct output is produced.
This sounds more like a tool AI! I thought that agent AIs generally had more persistent utility measures—this looks like the sort of thing where the AI has NO utility maximizing behavior until a problem is presented, then temporarily instantiates a problem-specific utility function (like the above).
Well, yes, it is a tool AI. But it does have an utility function, it can be built upon a decision theory, etc, and in this sense, it is Agent.