I honestly don’t understand how on Earth it would even be possible to understand FAI without understanding AGI on a general level. On some level, what you need isn’t a team of Sufficiently Advanced Geniuses who can figure out Friendliness while simultaneously minimizing their own understanding of AGI, but old-fashioned cooperation among the teams who are likely to become able to build AGI, with the shared goal of not building any agent that would defy its creators’ intentions.
(You can note that the creators’ intentions might be, so to speak, “evil”, but an agent that faithfully follows the “evil” intentions of an “evil” sort of human operator is already far Friendlier in kind than a paperclip maximizer—it was just given the wrong operator.)
A mathematical model of what this might look like: you might have a candidate class of formal models U that you think of as “all GAI” such that you know of no “reasonably computable”(which you might hope to define) member of the class (corresponding to an implementable GAI). Maybe you can find a subclass F in U that you think models Friendly AI. You can reason about these classes without knowing any examples of reasonably computable members of either. Perhaps you could even give an algorithm for taking an arbitrary example in U and transforming it via reasonable computation into an example of F. Then, once you actually construct an arbitrary GAI, you already know how to transform it into an FAI.
So the problem may be factorable such that you can solve a later part before solving the first part.
So, I’d agree it might be hard to understand F without understanding U as a class of objects. And lets leave aside how you would find and become certain of such definitions. If you could, though, you might hope that you can define them and work with them without ever constructing an example. Patterns not far off from this occur in mathematical practice, for example families of graphs with certain properties known to exist via probabilistic methods, but with no constructed examples.
Does that help, or did I misunderstand somewhere?
(edit: I don’t claim an eventual solution would fit the above description, this is just I hope a sufficient example that such things are mathematically possible)
Then, once you actually construct an arbitrary GAI, you already know how to transform it into an FAI.
Frankly, I don’t trust this claim for a second, because important components of the Friendliness problem are being completely shunted aside. For one thing, in order for this to even start making sense, you have to be able to specify a computable utility function for the AGI agent in the first place. The current models being used for this “mathematical” research don’t have any such thing, ie: AIXI specifies reward as a real-valued percept rather than a function over its world-model.
The problem is not the need for large amounts of computing power (ie: the problem is not specifying the right behavior and then “scaling it down” or “approximating” a “tractable example from the class”). The problem is not being able to specify what the agent values in detail. No amount of math wank about “approximation” and “candidate class of formal models U” is going to solve the basic problem of having to change the structure away from AIXI in the first place.
I really ought to apologize for use of the term “math wank”, but this really is the exact opposite approach to how one constructs correct programs. What you don’t do to produce a correct computer program, knowing its specification, is try to specify a procedure that will, given an incomplete infinity of time, somehow transform an arbitrary program from some class of programs into the one you want. What you do is write the single exact program you want, correct-by-construction, and prove formally (model checking, dependent types, whatever you please) that it exactly obeys its specification.
If you are wondering where the specification for an FAI comes from, well, that’s precisely the primary research problem to solve! But it won’t get solved by trying to write a function that takes as input an arbitrary instance or approximation of AIXI and returns that same instance of AIXI “transformed” to use a Friendly utility function.
Oh yes, it sounds like I did misunderstand you. I thought you were saying you didn’t understand how such a thing could happen in principle, not that you were skeptical of the currently popular models. The classes U and F above, should something like that ever come to pass, need not be AIXI-like (nor need they involve utility functions).
I think I’m hearing that you’re very skeptical about the validity of current toy mathematical models. I think it’s common for people to motte and bailey between the mathematics and the phenomena they’re hoping to model, and it’s an easy mistake for most people to make. In a good discussion, you should separate out the “math wank” (which I like to just call math) from the transfer of that wank to reality that you hope to model.
Sometimes toy models are helpful and some times they are distractions that lead nowhere or embody a mistaken preconception. I see you as claiming these models are distractions, not that no model is possible. Accurate?
Sometimes toy models are helpful and some times they are distractions that lead nowhere or embody a mistaken preconception. I see you as claiming these models are distractions, not that no model is possible. Accurate?
I very much favor bottom-up modelling based on real evidence rather than mathematical models that come out looking neat by imposing our preconceptions on the problem a priori.
The classes U and F above, should something like that ever come to pass, need not be AIXI-like (nor need they involve utility functions).
Right. Which is precisely why I don’t like when we attempt to do FAI research under the assumption of AIXI-like-ness.
I very much favor bottom-up modelling based on real evidence rather than mathematical models that come out looking neat by imposing our preconceptions on the problem a priori.
(edit: I think I might understand after-all; it sounds like you’re claiming AIXI-like things are unlikely to be useful since they’re based mostly on preconceptions that are likely false?)
I don’t think I understand what you mean here. Everyone favors modeling based on real evidence as opposed to fake evidence, and everyone favors avoiding the import of false preconceptions. It sounds like you prefer more constructive approaches?
Right. Which is precisely why I don’t like when we attempt to do FAI research under the assumption of AIXI-like-ness.
I agree if you’re saying that we shouldn’t assume AIXI-like-ness to define the field. I disagree if you’re saying it’s a waste for people to explore that idea space though: it seems ripe to me.
I don’t think it’s an active waste of time to explore the research that can be done with things like AIXI models. I do, however, think that, for instance, flaws of AIXI-like models should be taken as flaws of AIXI-like models, rather than generalized to all possible AI designs.
So for example, some people (on this site and elsewhere) have said we shouldn’t presume that a real AGI or real FAI will necessarily use VNM utility theory to make decisions. For various reasons, I think that exploring that idea-space is a good idea, in that relaxing the VNM utility and rationality assumptions can both take us closer to how real, actually-existing minds work, and to how we normatively want an artificial agent to behave.
I offered the transform as an example how things can mathematically factor, so like I said, that may not be what the solution looks like. My feeling is that it’s too soon to throw out anything that might look like that pattern though.
Formally, you don’t. Informally, you might try approximate definitions and see how they fail to capture elements of reality, or you might try and find analogies to other situations that have been modeled well and try to capture similar structure. Mathematicians et al usually don’t start new fields of inquiry from a set of definitions, they start from an intuition grounded in reality and previously discovered mathematics and iterate until the field takes shape. Although I’m not a physicist, the possibly incorrect story I’ve heard is that Feynman path integrals are a great example of this.
I honestly don’t understand how on Earth it would even be possible to understand FAI without understanding AGI on a general level. On some level, what you need isn’t a team of Sufficiently Advanced Geniuses who can figure out Friendliness while simultaneously minimizing their own understanding of AGI, but old-fashioned cooperation among the teams who are likely to become able to build AGI, with the shared goal of not building any agent that would defy its creators’ intentions.
(You can note that the creators’ intentions might be, so to speak, “evil”, but an agent that faithfully follows the “evil” intentions of an “evil” sort of human operator is already far Friendlier in kind than a paperclip maximizer—it was just given the wrong operator.)
A mathematical model of what this might look like: you might have a candidate class of formal models U that you think of as “all GAI” such that you know of no “reasonably computable”(which you might hope to define) member of the class (corresponding to an implementable GAI). Maybe you can find a subclass F in U that you think models Friendly AI. You can reason about these classes without knowing any examples of reasonably computable members of either. Perhaps you could even give an algorithm for taking an arbitrary example in U and transforming it via reasonable computation into an example of F. Then, once you actually construct an arbitrary GAI, you already know how to transform it into an FAI.
So the problem may be factorable such that you can solve a later part before solving the first part.
So, I’d agree it might be hard to understand F without understanding U as a class of objects. And lets leave aside how you would find and become certain of such definitions. If you could, though, you might hope that you can define them and work with them without ever constructing an example. Patterns not far off from this occur in mathematical practice, for example families of graphs with certain properties known to exist via probabilistic methods, but with no constructed examples.
Does that help, or did I misunderstand somewhere?
(edit: I don’t claim an eventual solution would fit the above description, this is just I hope a sufficient example that such things are mathematically possible)
Frankly, I don’t trust this claim for a second, because important components of the Friendliness problem are being completely shunted aside. For one thing, in order for this to even start making sense, you have to be able to specify a computable utility function for the AGI agent in the first place. The current models being used for this “mathematical” research don’t have any such thing, ie: AIXI specifies reward as a real-valued percept rather than a function over its world-model.
The problem is not the need for large amounts of computing power (ie: the problem is not specifying the right behavior and then “scaling it down” or “approximating” a “tractable example from the class”). The problem is not being able to specify what the agent values in detail. No amount of math wank about “approximation” and “candidate class of formal models U” is going to solve the basic problem of having to change the structure away from AIXI in the first place.
I really ought to apologize for use of the term “math wank”, but this really is the exact opposite approach to how one constructs correct programs. What you don’t do to produce a correct computer program, knowing its specification, is try to specify a procedure that will, given an incomplete infinity of time, somehow transform an arbitrary program from some class of programs into the one you want. What you do is write the single exact program you want, correct-by-construction, and prove formally (model checking, dependent types, whatever you please) that it exactly obeys its specification.
If you are wondering where the specification for an FAI comes from, well, that’s precisely the primary research problem to solve! But it won’t get solved by trying to write a function that takes as input an arbitrary instance or approximation of AIXI and returns that same instance of AIXI “transformed” to use a Friendly utility function.
Oh yes, it sounds like I did misunderstand you. I thought you were saying you didn’t understand how such a thing could happen in principle, not that you were skeptical of the currently popular models. The classes U and F above, should something like that ever come to pass, need not be AIXI-like (nor need they involve utility functions).
I think I’m hearing that you’re very skeptical about the validity of current toy mathematical models. I think it’s common for people to motte and bailey between the mathematics and the phenomena they’re hoping to model, and it’s an easy mistake for most people to make. In a good discussion, you should separate out the “math wank” (which I like to just call math) from the transfer of that wank to reality that you hope to model.
Sometimes toy models are helpful and some times they are distractions that lead nowhere or embody a mistaken preconception. I see you as claiming these models are distractions, not that no model is possible. Accurate?
I very much favor bottom-up modelling based on real evidence rather than mathematical models that come out looking neat by imposing our preconceptions on the problem a priori.
Right. Which is precisely why I don’t like when we attempt to do FAI research under the assumption of AIXI-like-ness.
(edit: I think I might understand after-all; it sounds like you’re claiming AIXI-like things are unlikely to be useful since they’re based mostly on preconceptions that are likely false?)
I don’t think I understand what you mean here. Everyone favors modeling based on real evidence as opposed to fake evidence, and everyone favors avoiding the import of false preconceptions. It sounds like you prefer more constructive approaches?
I agree if you’re saying that we shouldn’t assume AIXI-like-ness to define the field. I disagree if you’re saying it’s a waste for people to explore that idea space though: it seems ripe to me.
I don’t think it’s an active waste of time to explore the research that can be done with things like AIXI models. I do, however, think that, for instance, flaws of AIXI-like models should be taken as flaws of AIXI-like models, rather than generalized to all possible AI designs.
So for example, some people (on this site and elsewhere) have said we shouldn’t presume that a real AGI or real FAI will necessarily use VNM utility theory to make decisions. For various reasons, I think that exploring that idea-space is a good idea, in that relaxing the VNM utility and rationality assumptions can both take us closer to how real, actually-existing minds work, and to how we normatively want an artificial agent to behave.
Modulo nitpicking, agreed on both points.
I offered the transform as an example how things can mathematically factor, so like I said, that may not be what the solution looks like. My feeling is that it’s too soon to throw out anything that might look like that pattern though.
That sounds plausible, but how do you start to reason about such models of computation if they haven’t even been properly defined yet?
Formally, you don’t. Informally, you might try approximate definitions and see how they fail to capture elements of reality, or you might try and find analogies to other situations that have been modeled well and try to capture similar structure. Mathematicians et al usually don’t start new fields of inquiry from a set of definitions, they start from an intuition grounded in reality and previously discovered mathematics and iterate until the field takes shape. Although I’m not a physicist, the possibly incorrect story I’ve heard is that Feynman path integrals are a great example of this.