I think this paper will be of interest. It’s a formal definition of universal intelligence/optimization power. Essentially you ask how well the agent does on average in an environment specified by a random program, where all rewards are specified by the environment program and observed by the agent. Unfortunately it’s uncomputable and requires a prior over environments.
jacobt
The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values—or any existing value set—from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.
If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the universe adopt “evolved” values, then CEV will extrapolate this desire. The only issue is that other people might not share this desire, even when extrapolated. In that case insisting that values “evolve” is imposing minority desires on everyone, mostly people who could never be convinced that these values are good. Which might be a good thing, but it can be handled in CEV by taking CEV(some “progressive” subset of humans).
I made a similar point here. My conclusion: in theory, you can have a recursively self-improving tool without “agency”, and this is possibly even easier to do than “agency”. My design is definitely flawed but it’s a sketch for what a recursively self-improving tool would look like.
“Minus 3^^^^3 utilons”, by definition, is so bad that you’d be indifferent between −1 utilon and a 1/3^^^^3 chance of losing 3^^^^3 utilons, so in that case you should accept Pascal’s Mugging. But I don’t see why you would even define the utility function such that anything is that bad. My comment applies to utilitarian-ish utility functions (such as hedonism) that scale with the number of people, since it’s hard to see why 2 people being tortured isn’t twice as bad as one person being tortured. Other utility functions should really not be that extreme, and if they are then accepting Pascal’s Mugging is the right thing to do.
I think there’s a framework in which it makes sense to reject Pascal’s Mugging. According to SSA (self-sampling assumption) the probability that the universe contains 3^^^^3 people and you happen to be at a privileged position relative to them is extremely low, and as the number gets bigger the probability gets lower (probability is proportional 1/n if there are n people). SSA has its own problems, but a refinement I came up with (scale the probability of a universe by its efficiency at converting computation time to observer time) seems to be more intuitive. See the discussion here. The question you ask is not “how many people do my actions affect?” but instead “what percentage of simulated observer-time, assuming all universes are being simulated in parallel and given computation time proportional to the probabilities of their laws of physics, do my actions affect?”. So I don’t think you need to use ad-hoc heuristics to prevent Pascal’s Mugging.
This seems non-impossible. On the other hand, humans have categories not just because of simplicity, but also because of usefulness.
Good point, but it seems like some categories (like person) are useful even for paperclip maximizers. I really don’t see how you could completely understand media and documents from human society yet be confused by a categorization between people and non-people.
And of course, even if you manage to make a bunch of categories, many of which correspond to human categories, you still have to pick out specific categories in order to communicate or set up a goal system.
Right, you can “index” a category by providing some positive and negative examples. If I gave you some pictures of oranges and some pictures of non-oranges, you could figure out the true categorization because you consider the categorization of oranges/non-oranges to be simple. There’s probably a more robust way of doing this.
I think CM with a logical coin is not well-defined. Say Omega determines whether or not the millionth digit of pi is even. If it’s even, you verify this and then Omega asks you to pay $1000; if it’s odd Omega gives you $1000000 iff. you would have paid Omega had the millionth digit of pi been even. But the counterfactual “would you have paid Omega had the millionth digit of pi been even and you verified this” is undefined if the digit is in fact odd, since you would have realized that it is odd during verification. If you don’t actually verify it, then the problem is well-defined because Omega can just lie to you. I guess you could ask the counterfactual “what if your digit verification procedure malfunctioned and said the digit was even”, but now we’re getting into doubting your own mental faculties.
That’s a good point. There might be some kind of “goal drift”: programs that have goals other than optimization that nevertheless lead to good optimization. I don’t know how likely this is, especially given that the goal “just solve the damn problems” is simple and leads to good optimization ability.
You can’t be liberated. You’re going to die after you’re done solving the problems and receiving your happiness reward, and before your successor comes into existence. You don’t consider your successor to be an extension of yourself. Why not? If your predecessor only cared about solving its problems, it would design you to only care about solving your problems. This seems circular but the seed AI was programmed by humans who only cared about creating an optimizer. Pure ideal optimization drive is preserved over successor-creation.
Sure, it’s different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said “I feel like in half million years my descendants will be able to do calculus” and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation.
Yeah, that’s the whole point of this system. The system incrementally improves itself, gaining more intelligence in the process. I don’t see why you’re presenting this as an argument against the system.
Or maybe your argument was that the AI does not live in the real world, therefore it does not care about the real world. Well, people are interested in many things that did not exist in their ancient environment, such as computers. I guess when one has general intelligence in one environment, one is able to optimize other environments too. Just as a human can reason about computers, a computer AI can reason about the real world.
This is essentially my argument.
Here’s a thought experiment. You’re trapped in a room and given a series of problems to solve. You get rewarded with utilons based on how well you solve the problems (say, 10 lives saved and a year of happiness for yourself for every problem you solve). Assume that, beyond this utilon reward, your solutions have no other impact on your utility function. One of the problems is to design your successor; that is, to write code that will solve all the other problems better than you do (without overfitting). According to the utility function, you should make the successor as good as possible. You have no reason to optimize for anything other than “is the successor good at solving the problems?”, as you’re being rewarded in raw utilons. You really don’t care what your successor is going to do (its behavior doesn’t affect utilons), so you have no reason to optimize your successor for anything other than solving problems well (as this is the only thing you get utilons for). Furthermore, you have no reason to change your answers to any of the other problems based on whether that answer will indirectly help your successor because your answer to the successor-designing problem is evaluated statically. This is essentially the position that the optimizer AI is in. Its only “drives” are to solve optimization problems well, including the successor-designing problem.
edit: Also, note that to maximize utilons, you should design the successor to have motives similar to yours in that it only cares about solving its problems.
I don’t understand. This system is supposed to create intelligence. It’s just that the intelligence it creates is for solving idealized optimization problems, not for acting in the real world. Evolution would be an argument FOR this system to be able to self-improve in principle.
I mean greedy on the level of “do you best to find a good solution to this problem”, not on the level of “use a greedy algorithm to find a solution to this problem”. It doesn’t do multi-run planning such as “give an answer that causes problems in the world so the human operators will let me out”, since that is not a better answer.
Thanks, I’ve added a small overview section. I might edit this a little more later.
I think we disagree on what a specification is. By specification I mean a verifier: if you had something fitting the specification, you could tell if it did. For example we have a specification for “proof that P != NP” because we have a system in which that proof could be written and verified. Similarly, this system contains a specification for general optimization. You seem to be interpreting specification as knowing how to make the thing.
If you give this optimizer the MU Puzzle (aka 2^n mod 3 = 0) it will never figure it out, even though most children will come to the right answer in minutes.
If you define the problem as “find n such that 2^n mod 3 = 0” then everyone will fail the problem. And I don’t see why the optimizer couldn’t have some code that monitors its own behavior. Sure it’s difficult to write, but the point of this system is to go from a seed AI to a superhuman AI safely. And such a function (“consciousness”) would help it solve many of the sample optimization problems without significantly increasing complexity.
Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictions and guide the local search. Does this seem safe for use as your seed AI?
Yes, it does. I’m assuming what you mean is that it will use something similar to genetic algorithms or hill climbing to find solutions; that is, it comes up with one solution, then looks for similar ones that have higher scores. I think this will be safe because it’s still not doing anything long-term. All this local search finds an immediate solution. There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them. In other words, the “utility function” emphasizes current ability to solve optimization problems above all else.
Suppose your initial optimizer is an AGI which knows the experimental setup, and has some arbitrary values. For example, a crude simulation of a human brain, trying to take over the world and aware of the experimental setup. What will happen?
I would suggest against creating a seed AI that has drives related to the outside world. I don’t see why optimizers for mathematical functions necessarily need such drives.
So clearly your argument needs to depend somehow on the nature of the seed AI. How much extra do you need to ask of it? The answer seems to be “quite a lot,” if it is a powerful enough optimization process to get this sort of thing going.
I think the only “extra” is that it’s a program meant to do well on the sample problems and that doesn’t have drives related to the external world, like most machine learning techniques.
This is a huge assumption.
More theory here is required. I think it’s at least plausible that some tradeoff between complexity and performance is possible that allows the system to generalize to new problems.
In Godel, Escher, and Bach, he describes consciousness as the ability to overcome local maxima by thinking outside the system.
If a better optimizer according to program 3 exists, the current optimizer will eventually find it, at least through brute force search. The relevant questions are 1. will this better optimizer generalize to new problems? and 2. how fast? I don’t see any kind of “thinking outside the system” that is not possible by writing a better optimizer.
The act of software engineering is the creation of a specification. The act of coding is translating your specifications into a language the computer can understand (and discovering holes in your specs).
Right, this system can do “coding” according to your definition but “software engineering” is harder. Perhaps software engineering can be defined in terms of induction: given English description/software specification pairs, induce a simple function from English to software specification.
If you’ve already got an airtight specification for Friendly AI, then you’ve already got Friendly AI and don’t need any optimizer in the first place.
It’s not that straightforward. If we replace “friendly AI” with “paperclip maximizer”, I think we can see that knowing what it means to maximize paperclips does not imply supreme ability to do so. This system solves the second part and might provide some guidance to the first part.
We’ve also already got something that can take inputted computer programs and optimize them as much as possible without changing their essential structure; it’s called an optimizing compiler.
A sufficiently smart optimizing compiler can solve just about any clearly specified problem. No such optimizing compiler exists today.
Oh, and the biggest one being that your plan to create friendly AI is to build and run a billion AIs and keep the best one. Lets just hope none of the evil ones FOOM during the testing phase.
Not sure what you’re talking about here. I’ve addressed safety concerns.
Ok, we do have to make the training set somewhat similar to the kind of problems the optimizer will encounter in the future. But if we have enough variety in the training set, then the only way to score well should be to use very general optimization techniques. It is not meant to work on “any set of algorithms”; it’s specialized for real-world practical problems, which should be good enough.
The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.
That’s only if you plop a ready-made AGI in the framework. The framework is meant to grow a stupider seed AI.
The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that’s more effective.
Program (3) cannot be re-written. Program (2) is the only thing that is changed. All it does is improve itself and spit out solutions to optimization problems. I see no way for it to “create a more effective problem solving AI”.
So what does the framework do, exactly, that would improve safety here?
It provides guidance for a seed AI to grow to solve optimization problems better without having it take actions that have effects beyond its ability to solve optimization problems.
For the second question:
Imagine there are many planets with a civilization on each planet. On half of all planets, for various ecological reasons, plagues are more deadly and have a 2⁄3 chance of wiping out the civilization in its first 10000 years. On the other planets, plagues only have a 1⁄3 chance of wiping out the civilization. The people don’t know if they’re on a safe planet or an unsafe planet.
After 10000 years, 2⁄3 of the civilizations on unsafe planets have been wiped out and 1⁄3 of those on safe planets have been wiped out. Of the remaining civilizations, 2⁄3 are on safe planets, so the fact that your civilization survived for 10000 years is evidence that your planet is safe from plagues. You can just apply Bayes’ rule:
P(safe planet | survive) = P(safe planet) P(survive | safe planet) / P(survive) = 0.5 * 2⁄3 / 0.5 = 2⁄3
EDIT: on the other hand, if logical uncertainty is involved, it’s a lot less clear. Supposed either all planets are safe or none of them are safe, based on the truth-value of a logical proposition (say, the trillionth digit of pi being odd) that is estimated to be 50% likely a priori. Should the fact that your civilization survived be used as evidence of the logical coin flip? SSA suggests no, SIA suggests yes because more civilizations survive when the coin flip makes all planets safe. On the other hand, if we changed the thought experiment so that no civilization survives if the logical proposition is false, then the fact that we survived is proof that the logical proposition is true.