I find it odd that Nick refers to “AGI goals” as an “anthropomorphic [and] hopelessly vague” idea. One model for AGI goals, for example, is the utility function, which is neither anthropomorphic (since humans don’t have them) nor vague.
It seems somewhat vague to me in the sense that the domain of the function is underspecified. Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality? Is that at all related to what humans would recognize as “goals” (say, the goal of visiting London)?
It seems to me that vagueness is different from having competing definitions (e.g., AIXI’s notion of utility function vs UDT’s) that may turn out to be wrong. In cryptography there are also competing formal definitions of “secure”, and for many of them it turns out they don’t coincide with our intuitive ideas of “secure”, so that a cryptographic scheme can satisfy some formal definition of security while still allowing attackers to “break” the scheme and steal information through ways not anticipated by the designer. Note that this is after several decades of intensive research by hundreds of cryptologists world-wide. Comparatively the problem of “AGI goals” has just begun to be studied. What is it that makes “hopelessly anthropomorphic and vague” apply to “AGI goals”, but not to “cryptographic security” as of, say, 1980?
It seems to me that vagueness is different from having competing definitions (e.g., AIXI’s notion of utility function vs UDT’s) that may turn out to be wrong.
AIXI’s utility function is useless, the fact that it can be called “utility function” notwithstanding. UDT’s utility function is not defined formally (its meaning depends on “math intuition”). For any real-world application of a utility function, we don’t have a formal notion of its domain. These definitions are somewhat vague, even if not hopelessly so. They are hopelessly vague for the purpose of building a FAI.
These definitions are somewhat vague, even if not hopelessly so.
Perhaps I shouldn’t have implied or given the impression that we have fully non-vague definitions of “utility function”. What if I instead said that our notions of utility function are not as vague as Vaniver makes them out to be? That our most promising approach for how to define “utility function” gives at least fairly clear conceptual guidance as to the domain, and that we can see some past ideas (e.g., just over sensory inputs) as definitely wrong?
That our most promising approach for how to define “utility function” gives at least fairly clear conceptual guidance as to the domain
Given that the standard of being “fairly clear” is rather vague, I don’t know if I disagree, but at the moment I don’t know of any approach to a potentially FAI-grade notion of preference of any clarity. Utility functions seem to be a wrong direction, since they don’t work in the context of the idea of control based on resolution of logical uncertainty (structure). (UDT’s “utility function” is more of a component of definition of something that is not a utility function.)
ADT utility value (which is a UDT-like goal definition) is somewhat formal, but only applies to toy examples, it’s not clear what it means even in these toy examples, it doesn’t work at all when there is uncertainty or incomplete control over that value on part of the agent, and I have no idea how to treat physical world in its context. (It also doesn’t have any domain, which seems like a desirable property for a structuralist goal definition.) This situation seems like the opposite of “clear” to me...
In this case, we’d need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory.
Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I’ve not seen), Vaniver’s characterization of how much the domain of the utility function is underspecified (“Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?”) is just wrong.
Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.
The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don’t exist. Since the alternatives that are not chosen are therefore made inconsistent, it’s not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.
Well, the publicly visible side of work does not seem to refer to specifically this, hence it is not so specified.
With regards to your idea, if you have to make that sort of preference function to really motivate the AI in sufficiently general manner that it may just be hell bent on killing everyone, I think the ‘risk’ is very far fetched exactly per my suspicion that viable super-intelligent UFAI is much too hard to worry about it (while all sorts of AIs that work on well specified mathematical problems would be much much simpler) . If it is the only way to motivate AI that is effective over evolution from seed to super-intelligence, then the only people who are working on something like UFAI are the FAI crowd. Keep in mind that if I want to cure cancer via the ‘software tools’ route and I am not signed up for cryonics then I’ll just go for the simplest solution that works, which will be some sort of automated reasoning over formal systems (not over real world states). Especially as the general AI would require the technologies from the former anyway.
It’s somewhat vague, not necessarily hopelessly so. The question of the domain of utility functions seems important and poorly understood, not to mention the possible inadequacy of the idea of utility functions over worlds, as opposed to something along the lines of a fixed utility value definition that doesn’t explicitly refer to any worlds.
It’s valuing external reality. Valuing sensory inputs and mental models would just result in wireheading.
It would have a utility function, in which it assigns value to possible futures. It’s not really a “goal” per se unless it’s a satisficer. Otherwise, it’s more of a general idea of what’s better or worse. It would want to make as many paperclips as it can, rather than build a billion of them.
Mathematically any value that AI can calculate from external anything is a function of sensory input.
Given the same stream of sensory inputs, external reality may be different depending on the AI’s outputs, and the AI can prefer one output to another based on their predicted effects on external reality even if they make no difference to its future sensory inputs.
‘Vague’ presumes the level of precision that is not present here. It is not even vague. It’s incoherent.
Even if you were right that valuing external reality is equivalent to valuing sensory input, how would that make it incoherent? Or are you saying that the idea of “external reality” is inherently incoherent?
The ‘predicted effects on external reality’ is a function of prior input and internal state.
The idea of external reality is not incoherent. The idea of valuing external reality with a mathematical function is.
Note, by the way, that valuing ‘wire in the head’ is also a type of ‘valuing external reality’, not in the sense of ‘external’ as in wire being outside the box that runs AI, but external in sense of wire being outside the algorithm of the AI. When that point is being discussed here, SI seem to magically acquire an understanding of distinction between outside an algorithm and inside of algorithm to argue that wireheading won’t happen. The confusion between model and reality appears and disappears at most convenient moments.
I think I’m getting a better idea of where our disagreement is coming from. You think of external reality as some particular universe, and since we don’t have direct knowledge of what that universe is, we can only apply our utility function to models of it that we build using sensory input, and not to external reality itself. Is this close to what you’re thinking?
If so, I suggest that “valuing external reality” makes more sense if you instead think of external reality as the collection of all possible universes. I described this idea in more detail in my post introducing UDT.
How would this assign utility to performing an experiment to falsify (drop probability of) some of the ‘possible worlds’ ? Note that such action decreases the sum of value over possible worlds by eliminating (decreasing weight of) some of the possible worlds.
Please note that the “utility function” to which Nick Szabo refers is the notion that is part of the SI marketing pitch, and therein it alludes to the concept of utility from economics—which does actually make the agent value gathering information—and creates impression that this is a general concept applicable to almost any AI and something likely to be created by an AGI team unaware of the whole ‘friendliness’ idea; something that would be simple to make for paperclips; the world’s best technological genius of the future AGI creators being just a skill of making real the stupid wishes which need to be corrected by SI.
Meanwhile, in the non-vague sense that you outline here, it appears much more dubious that anyone who does not believe in feasibility of friendliness would want to build this; it’s not even clear that anyone could. Meanwhile, an AI whose goal is only defined within a model based on physics as we know it, and lacking any sort of tie of that model to anything real—no value to keeping the model in sync with the world—is sufficient to build all that we need for mind uploading. Sensing is a very hard problem in AI, much more so for AGI.
How would this assign utility to performing an experiment to falsify (drop probability of) some of the ‘possible worlds’ ?
UDT would want to perform experiments so that it can condition its future outputs on the results of those experiments (i.e., give different outputs depending on how the experiments come out). This gives it higher utility without “falsifying” any of the possible worlds.
Note that such action decreases the sum of value over possible worlds by eliminating (decreasing weight of) some of the possible worlds.
The reason UDT is called “updateless” is that it doesn’t eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.
The rest of your comment makes some sense, but is your argument that without SI (if it didn’t exist), nobody else would try to make an AGI with senses and real-world goals? What about those people (like Ben Goertzel) who are currently trying to build such AGIs? Or is your argument that such people have no chance of actually building such AGIs at least until mind uploading happens first? What about the threat of neuromorphic (brain-inspired) AGIs as as we get closer to achieving uploading?
The reason UDT is called “updateless” is that it doesn’t eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.
A particular instance of UDT running particular execution history got to condition on this execution history; you can say that you call conditioning what I call updates; in practice you will want not to run the computations irrelevant to the particular machine, and you will have strictly less computing power in the machine than in the universe it inhabits including the machine itself. It would be good if you could provide example of experimentations it might perform, somewhat formally derived. It feels to me that while it is valuable that you formalized some of the notions you largely have shifted/renamed all the actual problems.
E.g. it is problematic to specify utility function on reality, its incoherent. In your case the utility function is specified on all mathematically representable theories, which may well not allow to actually value a paperclip. Plus the number of potential paperclips within a theory would grow larger than any computable function of size of the theory, and the actions may well be dominated by relatively small, but absolutely enormous, differences between huge theories. Can you make actual example of some utility function? It doesn’t have to correspond to paperclips—anything so that UDT with this plugged in would actually do something to our reality rather than the imaginary BusyBeaver(100) beings with imaginary dustspecks in their eyes which might be running a boxed sim of our world.
With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours? The marketing spiel in question is, indeed, that Ben Goertzel’s AI (or someone else’s) would maximize an utility function and kill everyone or something, which leads me to assume that they are not talking of your utility function.
With regards to neuromorphic AGIs, I think there’s far too much science fiction and far too little understanding of neurology in the rationalization of ‘why am I getting paid’. While I do not doubt that brain does implement some sort of ‘master trick’ in, perhaps, every cortical column, there is an elaborate system for motivating this whole, and that system quite thoroughly fails to care about the real world state, in deed. And once again, why do you think neuromorphic AGIs would have the sort of values of real world as per UDT?
edit: furthermore it seems fairly preposterous to assume high probability that your utility function will actually be implemented in a working manner—say, paperclip maximizing manner—by people who really need SI to tell them to beware of creating skynet. SI is the typical ‘high level idea guys’ with a belief that the tech guys much smarter than them in fact are specialized in lowly stuff and need the high level idea guys to provide philosophical guidance or else we all die. Incredibly common sight in startups that should never have started up (and fail invariably).
With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours?
You seem to think that I’m claiming that UDT’s notion of utility function is the only way real-world goals might be implemented in an AGI. I’m instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn’t say that only AIs using UDT can be said to have real-world goals.
At this point I’m wondering if Nick’s complaint of vagueness was about this more general usage of “goals”. It’s unclear from reading his comment, but in case it is, I can try to offer a definition: an AI can be said to have real-world goals if it tries to (and generally succeeds at) modeling its environment and chooses actions based on their predicted effects on its environment.
Goals in this sense seems to be something that AGI researchers actively pursue, presumably because they think it will make their AGIs more useful or powerful or intelligent. If you read Goertzel’s papers, he certainly talks about “goals”, “perceptions”, “actions”, “movement commands”, etc.
You seem to think that I’m claiming that UDT’s notion of utility function is the only way real-world goals might be implemented in an AGI. I’m instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn’t say that only AIs using UDT can be said to have real-world goals.
Then you having formalized your utility function has nothing to do with allegations of vagueness when it comes to defining the utility in the argument of how utility maximizers are dangerous. With regards to it being ‘the most promising approach’, I think it is a very, very silly idea to have an approach so general that we all may well end up sacrificed in the name of huge number of imaginary beings that might exist, an AI pascal-wagering itself on it’s own. It looks like a dead end, especially for friendliness.
At this point I’m wondering if Nick’s complaint of vagueness was about this more general usage of “goals”. It’s unclear from reading his comment, but in case it is, I can try to offer a definition: an AI can be said to have real-world goals if it tries to (and generally succeeds at) modeling its environment and chooses actions based on their predicted effects on its environment.
This does necessarily work like ‘I want most paperclips to exist therefore I will talk my way into controlling the world, then kill everyone and make paperclips’, though.
Goals in this sense seems to be something that AGI researchers actively pursue, presumably because they think it will make their AGIs more useful or powerful or intelligent. If you read Goertzel’s papers, he certainly talks about “goals”, “perceptions”, “actions”, “movement commands”, etc.
They also don’t try to make goals that couldn’t be outsmarted into nihilism. We humans sort-of have a goal of reproduction, except we’re too clever, and we use birth control.
In your UDT, the actual intelligent component is this mathematical intuition that you’d use to process this theory in reasonable time. The rest is optional and highly difficult (if not altogether impossible) icing, even for the most trivial goal such as paperclips, which may well in principle never work.
And the technologies employed in the intelligent component are, without any of those goals, and with much less intelligence (as in computing power and their optimality) requirement, sufficient for e.g. using them to design machinery for mind uploading.
Furthermore, and that is the most ridiculous thing, there is this ‘oracle AI’ being talked about, where an answering system is modelled as based on real world goals and real world utilities, as if those were somehow primal and universally applicable.
It seems to me that the goals and utilities are just an useful rhetorical device used to trigger anthropomorphization fallacy at will (in a selective way), as to solicit donations.
They also don’t try to make goals that couldn’t be outsmarted into nihilism.
They’re not explicitly trying to solve this problem because they don’t think it’s going to be a problem with their current approach of implementing goals. But suppose you’re right and they’re wrong, and somebody that wants to build a AGI ends up implementing a motivational system that outsmarts itself into nihilism. Well such an AGI isn’t very useful so wouldn’t they just keep trying until they stumble onto a motivational system that isn’t so prone to nihilism?
We humans sort-of have a goal of reproduction, except we’re too clever, and we use birth control.
Similarly, if we let evolution of humans continue, wouldn’t humans pretty soon have a motivational system for reproduction that we won’t want to cleverly work around?
They’re not explicitly trying to solve this problem because they don’t think it’s going to be a problem with their current approach of implementing goals.
They do not expect foom either.
Well such an AGI isn’t very useful
You can still have formally defined goals—satisfy conditions on equations, et cetera. Defined internally, without the problematic real world component. Use this for e.g. designing reliable cellular machinery (‘cure cancer and senescence’). Seems very useful to me.
so wouldn’t they just keep trying until they stumble onto a motivational system that isn’t so prone to nihilism?
How long would it take you to ‘stumble’ upon some goal for the UDT that translates to something actually real?
Similarly, if we let evolution of humans continue, wouldn’t humans pretty soon have a motivational system for reproduction that we won’t want to cleverly work around?
The evolution destructively tests designs against reality. Humans do have various motivational systems there, such as religion, btw.
I am not sure how you think a motivational system for reproduction could work, so that we would not embrace a solution that actually does not result in reproduction. (Given sufficient intelligence)
You can still have formally defined goals—satisfy conditions on equations, et cetera.
As I mentioned, there are AGI researchers trying to implement real-world goals right now. If they build an AGI that turns nihilistic, do you think they will just give up and start working on equation solvers instead, or try to “fix” their AGI?
How long would it take you to ‘stumble’ upon some goal for the UDT that translates to something actually real?
I guess probably not very long, if I had a working solution to “math intuition”, a sufficiently powerful computer to experiment with, and no concerns for safety...
Mathematically any value that AI can calculate from external anything is a function of sensory input.
Sure, but the kind of function matters for our purposes. That is, there’s a difference between an optimizing system that is designed to optimize for sensory input of a particular type, and a system that is designed to optimize for something that it currently treats sensory input of a particular type as evidence of, and that’s a difference I care about if I want that system to maximize the “something” rather than just rewire its own perceptions.
Be specific as of what is the input domain of the ‘function’ in question.
And yes, there is the difference: one is well defined and what is the AI research works towards, and other is part of extensive AI fear rationalization framework, where it is confused with the notion of generality of intelligence, as to presume that the practical AIs will maximize the “somethings”, followed by the notion that pretty much all “somethings” would be dangerous to maximize. The utility is a purely descriptive notion; the AI that decides on actions is a normative system.
edit: To clarify, the intelligence is defined here as ‘cross domain optimizer’ that would therefore be able to maximize something vague without it having to be coherently defined. It is similar to knights of the round table worrying that the AI would literally search for holy grail, because to said knights, abstract and ill defined goal of holy grail appears entirely natural; meanwhile for systems more intelligent than said knights such a confused goal, due to it’s incoherence, is impossible to define.
It seems to me that even if I ignore everything SI has to say about AI and existential risk and so on, ignore all the fear-mongering and etc., the idea of a system that attempts to change its environment so as to maximize the prevalence of some X remains a useful idea.
And if I extend the aspects of its environment that the system can manipulate to include its own hardware or software, or even just its own tuning parameters, it seems to me that there exists a perfectly crisp, measurable distinction between a system A that continues to increase the prevalence of X in its environment, and a system B that instead manipulates its own subsystems for measuring X.
If any part of that is as incoherent as you suggest, and you’re capable of pointing out the incoherence in a clear fashion, I would appreciate that.
the idea of a system that attempts to change its environment so as to maximize the prevalence of some X remains a useful idea.
The prevalence of X is defined how?
And if I extend the aspects of its environment that the system can manipulate to include its own hardware or software, or even just its own tuning parameters, it seems to me that there exists a perfectly crisp, measurable distinction between a system A that continues to increase the prevalence of X in its environment, and a system B that instead manipulates its own subsystems for measuring X.
In A, you confuse your model of the world with the world itself; in your model of the world you have a possible item ‘paperclip’, and you can therefore easily imagine maximization of number of paperclips inside your model of the world, complete with the AI necessarily trying to improve it’s understanding of the ‘world’ (your model). With B, you construct a falsely singular alternative of a rather broken AI, and see a crisp distinction between two irrelevant ideas.
The practical issue is that the ‘prevalence of some X’ can not be specified without the model of the world; you can not have a function without specifying it’s input domain, and the ‘reality’ is never an input domain of mathematical functions; the notion is not only incoherent but outright nonsensical.
If any part of that is as incoherent as you suggest, and you’re capable of pointing out the incoherence in a clear fashion, I would appreciate that.
Incoherence of so poorly defined concepts can not be demonstrated when no attempts has been made to make the notions specific enough to even rationally assert coherence in the first place.
It can only be said to be powerful if it will tend to do something significant regardless of how you stop it. If what it does has anything in common, even if it’s nothing beyond “signficant”, it can be said to value that.
Actually, this is example of something incredibly irritating about this entire singularity topic: verbal sophistry of no consequence. What do you call ‘powerful’ has absolutely zero relation to anything. A powerful drill doesn’t tend to do something significant regardless of how you stop it. Neither does powerful computer. Nor should powerful intelligence.
A powerful drill doesn’t tend to do something significant regardless of how you stop it. Neither does powerful computer. Nor should powerful intelligence.
In this case, I’m defining a powerful intelligence differently. An AI that is powerful in your sense is not much of a risk. It’s basically the kind of AI we have now. It’s neither highly dangerous, nor highly useful (in a singularity-inducing sense).
Building an AGI may not be feasible. If it is, it will be far more effective than a narrow AI, and far more dangerous. That’s why it’s primarily what SIAI is worried about.
nor highly useful (in a singularity-inducing sense).
I’m not clear what we mean by singularity here. If we had an algorithm that works on well defined problems we could solve practical problems. edit: Like improving that algorithm, mind uploading, etc.
Building an AGI may not be feasible. If it is, it will be far more effective than a narrow AI,
Effective at what? Would it cure cancer sooner? I doubt so. An “AGI” with a goal it wants to do, resisting any control, is a much more narrow AI than the AI that basically solves systems of equations. Who would I rather hire: impartial math genius that solves the tasks you specify for him, or a brilliant murderous sociopath hell bent on doing his own thing? The latter’s usefulness (to me, that’s it) is incredibly narrow.
and far more dangerous.
Besides being effective at being worse than useless?
That’s why it’s primarily what SIAI is worried about.
I’m not quite sure that there’s ‘why’ and ‘what’ in that ‘worried’.
If we have an AGI, it will figure out what problems we need solved and solve them. It may not beat a narrow AI (ANI) in the latter, but it will beat you in the former. You can thus save on the massive losses due to not knowing what you want, politics, not knowing how to best optimize something, etc. I doubt we’d be able to do 1% as well without an FAI as with one. That’s still a lot, but that means that a 0.1% chance of producing an FAI and a 99.9% chance of producing a UAI is better than a 100% chance of producing a whole lot of ANIs.
The latter’s usefulness (to me, that’s it) is incredibly narrow.
If we have an AGI, it will figure out what problems we need solved and solve them.
Only a friendly AGI would. The premise for funding to SI is not that they will build friendly AGI. The premise is that there is an enormous risk that someone else would for no particular reason add this whole ‘valuing real world’ thing into an AI, without adding any friendliness, actually restricting it’s generality when it comes to doing something useful.
Ultimately, the SI position is: input from us the idea guys with no achievements (outside philosophy), are necessary for the team competent enough to build a full AGI, to not kill everyone, and therefore you should donate (Previously, the position was you should donate so we build FAI before someone builds UFAI, but Luke Muehlhauser been generalizing to non-FAI solutions). That notion is rendered highly implausible when you pin down the meaning of AGI, as we did in this discourse. For the UFAI to happen and kill everyone, a potentially vastly more competent and intelligent team that SI has to fail spectacularly.
Only if his own thing isn’t also your own thing.
Will require simulation of me or a brain implant that effectively makes it extension of me. Do not want the former, and the latter is IA.
It’s valuing external reality. Valuing sensory inputs and mental models would just result in wireheading.
Thinking they are valuing “external reality” probably doesn’t really protect agents from wireheading. The agents just wind up with delusional ideas about what “external reality” consists of—built of the patchwork of underspecification left by the original programmers of this concept.
I know that it’s possible for an agent that’s created with a completely underspecified idea of reality to nonetheless value external reality and avoid wireheading. I know this because I am such an agent.
Everything humans can do, an AI could do. There’s little reason to believe humans are remotely optimum, so an AI could likely do it better.
The “everything humans can do, an AI could do better” argument cuts both ways. Humans can wirehead—machines may be able to wirehead better. That argument is pretty symmetric with the “wirehead avoidance” argument. So: I don’t think either argument is worth very much. There may be good arguments that illuminate the future frequency of wireheading, but these don’t qualify. It seems quite possible that our entire civilization could wirehead itself—along the lines suggested by David Pearce.
Everything a human can do, a human cannot do in the most extreme possible manner. An AI could be made to wirehead easier or harder. It could think faster or slower. It could be more creative or less creative. It could be nicer or meaner.
I wouldn’t begin to know how to build an AI that’s improved in all the right ways. It might not even be humanly possible. If it’s not humanly possible to build a good AI, it’s likely impossible for the AI to be able to improve on itself. There’s still a good chance that it would work.
Probably true—and few want wireheading machines—but the issues are the scale of the technical challenges, and—if these are non-trivial—how much folk will be prepared to pay for the feature. In a society of machines, maybe the occasional one that turns Buddhist—and needs to go back to the factory for psychological repairs—is within tolerable limits.
Many apparently think that making machines value “external reality” fixes the wirehead problem—e.g. see “Model-based Utility Functions”—but it leads directly to the problems of what you mean by “external reality” and how to tell a machine that that is what it is supposed to be valuing. It doesn’t look much like solving the problem to me.
I find it odd that Nick refers to “AGI goals” as an “anthropomorphic [and] hopelessly vague” idea. One model for AGI goals, for example, is the utility function, which is neither anthropomorphic (since humans don’t have them) nor vague.
It seems somewhat vague to me in the sense that the domain of the function is underspecified. Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality? Is that at all related to what humans would recognize as “goals” (say, the goal of visiting London)?
It seems to me that vagueness is different from having competing definitions (e.g., AIXI’s notion of utility function vs UDT’s) that may turn out to be wrong. In cryptography there are also competing formal definitions of “secure”, and for many of them it turns out they don’t coincide with our intuitive ideas of “secure”, so that a cryptographic scheme can satisfy some formal definition of security while still allowing attackers to “break” the scheme and steal information through ways not anticipated by the designer. Note that this is after several decades of intensive research by hundreds of cryptologists world-wide. Comparatively the problem of “AGI goals” has just begun to be studied. What is it that makes “hopelessly anthropomorphic and vague” apply to “AGI goals”, but not to “cryptographic security” as of, say, 1980?
AIXI’s utility function is useless, the fact that it can be called “utility function” notwithstanding. UDT’s utility function is not defined formally (its meaning depends on “math intuition”). For any real-world application of a utility function, we don’t have a formal notion of its domain. These definitions are somewhat vague, even if not hopelessly so. They are hopelessly vague for the purpose of building a FAI.
Perhaps I shouldn’t have implied or given the impression that we have fully non-vague definitions of “utility function”. What if I instead said that our notions of utility function are not as vague as Vaniver makes them out to be? That our most promising approach for how to define “utility function” gives at least fairly clear conceptual guidance as to the domain, and that we can see some past ideas (e.g., just over sensory inputs) as definitely wrong?
Given that the standard of being “fairly clear” is rather vague, I don’t know if I disagree, but at the moment I don’t know of any approach to a potentially FAI-grade notion of preference of any clarity. Utility functions seem to be a wrong direction, since they don’t work in the context of the idea of control based on resolution of logical uncertainty (structure). (UDT’s “utility function” is more of a component of definition of something that is not a utility function.)
ADT utility value (which is a UDT-like goal definition) is somewhat formal, but only applies to toy examples, it’s not clear what it means even in these toy examples, it doesn’t work at all when there is uncertainty or incomplete control over that value on part of the agent, and I have no idea how to treat physical world in its context. (It also doesn’t have any domain, which seems like a desirable property for a structuralist goal definition.) This situation seems like the opposite of “clear” to me...
In my original UDT post, I suggested
Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I’ve not seen), Vaniver’s characterization of how much the domain of the utility function is underspecified (“Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?”) is just wrong.
Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.
The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don’t exist. Since the alternatives that are not chosen are therefore made inconsistent, it’s not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.
Well, the publicly visible side of work does not seem to refer to specifically this, hence it is not so specified.
With regards to your idea, if you have to make that sort of preference function to really motivate the AI in sufficiently general manner that it may just be hell bent on killing everyone, I think the ‘risk’ is very far fetched exactly per my suspicion that viable super-intelligent UFAI is much too hard to worry about it (while all sorts of AIs that work on well specified mathematical problems would be much much simpler) . If it is the only way to motivate AI that is effective over evolution from seed to super-intelligence, then the only people who are working on something like UFAI are the FAI crowd. Keep in mind that if I want to cure cancer via the ‘software tools’ route and I am not signed up for cryonics then I’ll just go for the simplest solution that works, which will be some sort of automated reasoning over formal systems (not over real world states). Especially as the general AI would require the technologies from the former anyway.
It’s somewhat vague, not necessarily hopelessly so. The question of the domain of utility functions seems important and poorly understood, not to mention the possible inadequacy of the idea of utility functions over worlds, as opposed to something along the lines of a fixed utility value definition that doesn’t explicitly refer to any worlds.
It’s valuing external reality. Valuing sensory inputs and mental models would just result in wireheading.
It would have a utility function, in which it assigns value to possible futures. It’s not really a “goal” per se unless it’s a satisficer. Otherwise, it’s more of a general idea of what’s better or worse. It would want to make as many paperclips as it can, rather than build a billion of them.
Mathematically any value that AI can calculate from external anything is a function of sensory input.
‘Vague’ presumes the level of precision that is not present here. It is not even vague. It’s incoherent.
Given the same stream of sensory inputs, external reality may be different depending on the AI’s outputs, and the AI can prefer one output to another based on their predicted effects on external reality even if they make no difference to its future sensory inputs.
Even if you were right that valuing external reality is equivalent to valuing sensory input, how would that make it incoherent? Or are you saying that the idea of “external reality” is inherently incoherent?
The ‘predicted effects on external reality’ is a function of prior input and internal state.
The idea of external reality is not incoherent. The idea of valuing external reality with a mathematical function is.
Note, by the way, that valuing ‘wire in the head’ is also a type of ‘valuing external reality’, not in the sense of ‘external’ as in wire being outside the box that runs AI, but external in sense of wire being outside the algorithm of the AI. When that point is being discussed here, SI seem to magically acquire an understanding of distinction between outside an algorithm and inside of algorithm to argue that wireheading won’t happen. The confusion between model and reality appears and disappears at most convenient moments.
I think I’m getting a better idea of where our disagreement is coming from. You think of external reality as some particular universe, and since we don’t have direct knowledge of what that universe is, we can only apply our utility function to models of it that we build using sensory input, and not to external reality itself. Is this close to what you’re thinking?
If so, I suggest that “valuing external reality” makes more sense if you instead think of external reality as the collection of all possible universes. I described this idea in more detail in my post introducing UDT.
How would this assign utility to performing an experiment to falsify (drop probability of) some of the ‘possible worlds’ ? Note that such action decreases the sum of value over possible worlds by eliminating (decreasing weight of) some of the possible worlds.
Please note that the “utility function” to which Nick Szabo refers is the notion that is part of the SI marketing pitch, and therein it alludes to the concept of utility from economics—which does actually make the agent value gathering information—and creates impression that this is a general concept applicable to almost any AI and something likely to be created by an AGI team unaware of the whole ‘friendliness’ idea; something that would be simple to make for paperclips; the world’s best technological genius of the future AGI creators being just a skill of making real the stupid wishes which need to be corrected by SI.
Meanwhile, in the non-vague sense that you outline here, it appears much more dubious that anyone who does not believe in feasibility of friendliness would want to build this; it’s not even clear that anyone could. Meanwhile, an AI whose goal is only defined within a model based on physics as we know it, and lacking any sort of tie of that model to anything real—no value to keeping the model in sync with the world—is sufficient to build all that we need for mind uploading. Sensing is a very hard problem in AI, much more so for AGI.
UDT would want to perform experiments so that it can condition its future outputs on the results of those experiments (i.e., give different outputs depending on how the experiments come out). This gives it higher utility without “falsifying” any of the possible worlds.
The reason UDT is called “updateless” is that it doesn’t eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.
The rest of your comment makes some sense, but is your argument that without SI (if it didn’t exist), nobody else would try to make an AGI with senses and real-world goals? What about those people (like Ben Goertzel) who are currently trying to build such AGIs? Or is your argument that such people have no chance of actually building such AGIs at least until mind uploading happens first? What about the threat of neuromorphic (brain-inspired) AGIs as as we get closer to achieving uploading?
A particular instance of UDT running particular execution history got to condition on this execution history; you can say that you call conditioning what I call updates; in practice you will want not to run the computations irrelevant to the particular machine, and you will have strictly less computing power in the machine than in the universe it inhabits including the machine itself. It would be good if you could provide example of experimentations it might perform, somewhat formally derived. It feels to me that while it is valuable that you formalized some of the notions you largely have shifted/renamed all the actual problems.
E.g. it is problematic to specify utility function on reality, its incoherent. In your case the utility function is specified on all mathematically representable theories, which may well not allow to actually value a paperclip. Plus the number of potential paperclips within a theory would grow larger than any computable function of size of the theory, and the actions may well be dominated by relatively small, but absolutely enormous, differences between huge theories. Can you make actual example of some utility function? It doesn’t have to correspond to paperclips—anything so that UDT with this plugged in would actually do something to our reality rather than the imaginary BusyBeaver(100) beings with imaginary dustspecks in their eyes which might be running a boxed sim of our world.
With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours? The marketing spiel in question is, indeed, that Ben Goertzel’s AI (or someone else’s) would maximize an utility function and kill everyone or something, which leads me to assume that they are not talking of your utility function.
With regards to neuromorphic AGIs, I think there’s far too much science fiction and far too little understanding of neurology in the rationalization of ‘why am I getting paid’. While I do not doubt that brain does implement some sort of ‘master trick’ in, perhaps, every cortical column, there is an elaborate system for motivating this whole, and that system quite thoroughly fails to care about the real world state, in deed. And once again, why do you think neuromorphic AGIs would have the sort of values of real world as per UDT?
edit: furthermore it seems fairly preposterous to assume high probability that your utility function will actually be implemented in a working manner—say, paperclip maximizing manner—by people who really need SI to tell them to beware of creating skynet. SI is the typical ‘high level idea guys’ with a belief that the tech guys much smarter than them in fact are specialized in lowly stuff and need the high level idea guys to provide philosophical guidance or else we all die. Incredibly common sight in startups that should never have started up (and fail invariably).
You seem to think that I’m claiming that UDT’s notion of utility function is the only way real-world goals might be implemented in an AGI. I’m instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn’t say that only AIs using UDT can be said to have real-world goals.
At this point I’m wondering if Nick’s complaint of vagueness was about this more general usage of “goals”. It’s unclear from reading his comment, but in case it is, I can try to offer a definition: an AI can be said to have real-world goals if it tries to (and generally succeeds at) modeling its environment and chooses actions based on their predicted effects on its environment.
Goals in this sense seems to be something that AGI researchers actively pursue, presumably because they think it will make their AGIs more useful or powerful or intelligent. If you read Goertzel’s papers, he certainly talks about “goals”, “perceptions”, “actions”, “movement commands”, etc.
Then you having formalized your utility function has nothing to do with allegations of vagueness when it comes to defining the utility in the argument of how utility maximizers are dangerous. With regards to it being ‘the most promising approach’, I think it is a very, very silly idea to have an approach so general that we all may well end up sacrificed in the name of huge number of imaginary beings that might exist, an AI pascal-wagering itself on it’s own. It looks like a dead end, especially for friendliness.
This does necessarily work like ‘I want most paperclips to exist therefore I will talk my way into controlling the world, then kill everyone and make paperclips’, though.
They also don’t try to make goals that couldn’t be outsmarted into nihilism. We humans sort-of have a goal of reproduction, except we’re too clever, and we use birth control.
In your UDT, the actual intelligent component is this mathematical intuition that you’d use to process this theory in reasonable time. The rest is optional and highly difficult (if not altogether impossible) icing, even for the most trivial goal such as paperclips, which may well in principle never work.
And the technologies employed in the intelligent component are, without any of those goals, and with much less intelligence (as in computing power and their optimality) requirement, sufficient for e.g. using them to design machinery for mind uploading.
Furthermore, and that is the most ridiculous thing, there is this ‘oracle AI’ being talked about, where an answering system is modelled as based on real world goals and real world utilities, as if those were somehow primal and universally applicable.
It seems to me that the goals and utilities are just an useful rhetorical device used to trigger anthropomorphization fallacy at will (in a selective way), as to solicit donations.
They’re not explicitly trying to solve this problem because they don’t think it’s going to be a problem with their current approach of implementing goals. But suppose you’re right and they’re wrong, and somebody that wants to build a AGI ends up implementing a motivational system that outsmarts itself into nihilism. Well such an AGI isn’t very useful so wouldn’t they just keep trying until they stumble onto a motivational system that isn’t so prone to nihilism?
Similarly, if we let evolution of humans continue, wouldn’t humans pretty soon have a motivational system for reproduction that we won’t want to cleverly work around?
They do not expect foom either.
You can still have formally defined goals—satisfy conditions on equations, et cetera. Defined internally, without the problematic real world component. Use this for e.g. designing reliable cellular machinery (‘cure cancer and senescence’). Seems very useful to me.
How long would it take you to ‘stumble’ upon some goal for the UDT that translates to something actually real?
The evolution destructively tests designs against reality. Humans do have various motivational systems there, such as religion, btw.
I am not sure how you think a motivational system for reproduction could work, so that we would not embrace a solution that actually does not result in reproduction. (Given sufficient intelligence)
Goertzel does, or at least thinks it’s possible. See http://lesswrong.com/lw/aw7/muehlhausergoertzel_dialogue_part_1/ where he says “GOLEM is a design for a strongly self-modifying superintelligent AI system”. Also http://novamente.net/AAAI04.pdf where he talks about Novamente potentially being “thoroughly self-modifying and self-improving general intelligence”.
As I mentioned, there are AGI researchers trying to implement real-world goals right now. If they build an AGI that turns nihilistic, do you think they will just give up and start working on equation solvers instead, or try to “fix” their AGI?
I guess probably not very long, if I had a working solution to “math intuition”, a sufficiently powerful computer to experiment with, and no concerns for safety...
Actions are the product of sensory input and existing state—but the basic idea withstands this, I think.
Sure, but the kind of function matters for our purposes. That is, there’s a difference between an optimizing system that is designed to optimize for sensory input of a particular type, and a system that is designed to optimize for something that it currently treats sensory input of a particular type as evidence of, and that’s a difference I care about if I want that system to maximize the “something” rather than just rewire its own perceptions.
Be specific as of what is the input domain of the ‘function’ in question.
And yes, there is the difference: one is well defined and what is the AI research works towards, and other is part of extensive AI fear rationalization framework, where it is confused with the notion of generality of intelligence, as to presume that the practical AIs will maximize the “somethings”, followed by the notion that pretty much all “somethings” would be dangerous to maximize. The utility is a purely descriptive notion; the AI that decides on actions is a normative system.
edit: To clarify, the intelligence is defined here as ‘cross domain optimizer’ that would therefore be able to maximize something vague without it having to be coherently defined. It is similar to knights of the round table worrying that the AI would literally search for holy grail, because to said knights, abstract and ill defined goal of holy grail appears entirely natural; meanwhile for systems more intelligent than said knights such a confused goal, due to it’s incoherence, is impossible to define.
(shrug)
It seems to me that even if I ignore everything SI has to say about AI and existential risk and so on, ignore all the fear-mongering and etc., the idea of a system that attempts to change its environment so as to maximize the prevalence of some X remains a useful idea.
And if I extend the aspects of its environment that the system can manipulate to include its own hardware or software, or even just its own tuning parameters, it seems to me that there exists a perfectly crisp, measurable distinction between a system A that continues to increase the prevalence of X in its environment, and a system B that instead manipulates its own subsystems for measuring X.
If any part of that is as incoherent as you suggest, and you’re capable of pointing out the incoherence in a clear fashion, I would appreciate that.
The prevalence of X is defined how?
In A, you confuse your model of the world with the world itself; in your model of the world you have a possible item ‘paperclip’, and you can therefore easily imagine maximization of number of paperclips inside your model of the world, complete with the AI necessarily trying to improve it’s understanding of the ‘world’ (your model). With B, you construct a falsely singular alternative of a rather broken AI, and see a crisp distinction between two irrelevant ideas.
The practical issue is that the ‘prevalence of some X’ can not be specified without the model of the world; you can not have a function without specifying it’s input domain, and the ‘reality’ is never an input domain of mathematical functions; the notion is not only incoherent but outright nonsensical.
Incoherence of so poorly defined concepts can not be demonstrated when no attempts has been made to make the notions specific enough to even rationally assert coherence in the first place.
OK. Thanks for your time.
It can only be said to be powerful if it will tend to do something significant regardless of how you stop it. If what it does has anything in common, even if it’s nothing beyond “signficant”, it can be said to value that.
Actually, this is example of something incredibly irritating about this entire singularity topic: verbal sophistry of no consequence. What do you call ‘powerful’ has absolutely zero relation to anything. A powerful drill doesn’t tend to do something significant regardless of how you stop it. Neither does powerful computer. Nor should powerful intelligence.
In this case, I’m defining a powerful intelligence differently. An AI that is powerful in your sense is not much of a risk. It’s basically the kind of AI we have now. It’s neither highly dangerous, nor highly useful (in a singularity-inducing sense).
Building an AGI may not be feasible. If it is, it will be far more effective than a narrow AI, and far more dangerous. That’s why it’s primarily what SIAI is worried about.
I’m not clear what we mean by singularity here. If we had an algorithm that works on well defined problems we could solve practical problems. edit: Like improving that algorithm, mind uploading, etc.
Effective at what? Would it cure cancer sooner? I doubt so. An “AGI” with a goal it wants to do, resisting any control, is a much more narrow AI than the AI that basically solves systems of equations. Who would I rather hire: impartial math genius that solves the tasks you specify for him, or a brilliant murderous sociopath hell bent on doing his own thing? The latter’s usefulness (to me, that’s it) is incredibly narrow.
Besides being effective at being worse than useless?
I’m not quite sure that there’s ‘why’ and ‘what’ in that ‘worried’.
If we have an AGI, it will figure out what problems we need solved and solve them. It may not beat a narrow AI (ANI) in the latter, but it will beat you in the former. You can thus save on the massive losses due to not knowing what you want, politics, not knowing how to best optimize something, etc. I doubt we’d be able to do 1% as well without an FAI as with one. That’s still a lot, but that means that a 0.1% chance of producing an FAI and a 99.9% chance of producing a UAI is better than a 100% chance of producing a whole lot of ANIs.
Only if his own thing isn’t also your own thing.
Only a friendly AGI would. The premise for funding to SI is not that they will build friendly AGI. The premise is that there is an enormous risk that someone else would for no particular reason add this whole ‘valuing real world’ thing into an AI, without adding any friendliness, actually restricting it’s generality when it comes to doing something useful.
Ultimately, the SI position is: input from us the idea guys with no achievements (outside philosophy), are necessary for the team competent enough to build a full AGI, to not kill everyone, and therefore you should donate (Previously, the position was you should donate so we build FAI before someone builds UFAI, but Luke Muehlhauser been generalizing to non-FAI solutions). That notion is rendered highly implausible when you pin down the meaning of AGI, as we did in this discourse. For the UFAI to happen and kill everyone, a potentially vastly more competent and intelligent team that SI has to fail spectacularly.
Will require simulation of me or a brain implant that effectively makes it extension of me. Do not want the former, and the latter is IA.
Thinking they are valuing “external reality” probably doesn’t really protect agents from wireheading. The agents just wind up with delusional ideas about what “external reality” consists of—built of the patchwork of underspecification left by the original programmers of this concept.
I know that it’s possible for an agent that’s created with a completely underspecified idea of reality to nonetheless value external reality and avoid wireheading. I know this because I am such an agent.
Everything humans can do, an AI could do. There’s little reason to believe humans are remotely optimum, so an AI could likely do it better.
The “everything humans can do, an AI could do better” argument cuts both ways. Humans can wirehead—machines may be able to wirehead better. That argument is pretty symmetric with the “wirehead avoidance” argument. So: I don’t think either argument is worth very much. There may be good arguments that illuminate the future frequency of wireheading, but these don’t qualify. It seems quite possible that our entire civilization could wirehead itself—along the lines suggested by David Pearce.
Everything a human can do, a human cannot do in the most extreme possible manner. An AI could be made to wirehead easier or harder. It could think faster or slower. It could be more creative or less creative. It could be nicer or meaner.
I wouldn’t begin to know how to build an AI that’s improved in all the right ways. It might not even be humanly possible. If it’s not humanly possible to build a good AI, it’s likely impossible for the AI to be able to improve on itself. There’s still a good chance that it would work.
Probably true—and few want wireheading machines—but the issues are the scale of the technical challenges, and—if these are non-trivial—how much folk will be prepared to pay for the feature. In a society of machines, maybe the occasional one that turns Buddhist—and needs to go back to the factory for psychological repairs—is within tolerable limits.
Many apparently think that making machines value “external reality” fixes the wirehead problem—e.g. see “Model-based Utility Functions”—but it leads directly to the problems of what you mean by “external reality” and how to tell a machine that that is what it is supposed to be valuing. It doesn’t look much like solving the problem to me.