Well, to make a guaranteed friendly AI you probably need to prove theorems about your AI design. And our universe most likely contains many copies of everything. So figuring out the right decision theory in the presence of copies seems to be a necessary step on the road to FAI. I don’t speak for SingInst here, this is just how I feel.
to make a guaranteed friendly AI you probably need to prove theorems about your AI design.
Wouldn’t this be a level mismatch in a multi-level AI architecture? Like, proving things about low-level neural computational substrate instead of about the conceptual level where actual cognition would take place, and where the actual friendliness would be defined? [and this level can’t be isomorphic to any formal logical system, except in symbolic AI, which doesn’t work]
figuring out the right decision theory in the presence of copies seems to be a necessary step on the road to FAI
And again, a conceptual-level understanding should do the trick. Like, knowing that I play PD against myself would be sufficient to cooperate. Besides, as EY frequently says, it’s really hard to find oneself in a true PD. Usually, it’s iterated PD, or some Schelling’s conflict game [BTW, huge thanks for recommending his book in one of your posts!]
AFAICT, the general architecture that EY advocates (-ed?) in “Levels of Organization in GI” is multilevel. But this doesn’t automatically mean that it’s impossible to prove anything about it. Maybe it’s possible, just not using the formal logic methods. [And so maybe getting not a 100% certainty, but 100-1e-N%, which should be sufficient for large enough N].
AIXI doesn’t work so much more than symbolic AI Lisp programs of the 70s. I mean, the General Problem Solver would be superintelligent given infinite computing power.
A good deal of the material I have ever produced – specifically, everything dated 2002 or earlier – I now consider completely obsolete. (...) I no longer consider LOGI’s theory useful for building de novo AI.
To make the General Problem Solver or any other powerful computing device do anything interesting in the real world, you need to give it a formal description that contains the real world as a special case. You could use the universal prior, which gives you AIXI. Or you could use the yet-unspecified prior of UDT, which gives you the yet-unspecified UDT-AIXI.
The central difficulty of decision theory doesn’t get easier if you have lots of computing power. Imagine you’re playing the PD against someone. You both know each other’s source code, but you have a halting oracle and your opponent doesn’t. With so much power, what do you do? I simulate the other guy and… whoops, didn’t think this through. Looks like I must avoid looking at the result. Hmmm.
Oh… LOGI’s totally relinquished then? They should mark the paper as completely obsolete in the list of SI publications, otherwise it’s confusing :) I was under impression I read some relatively recent Eliezer’s text where he says the prospective FAI researchers must thoroughly understand LOGI before moving to the current even more advanced undisclosed architecture...
The central difficulty of decision theory doesn’t get easier if you have lots of computing power
Yes, this is an interesting problem. And it appears to produce some neat metaphors. Like: maintain illusion of free will by deliberately avoiding knowing your own decision in advance [or become crazy]. And avoid de-humanizing your opponents [or get defected].
But does it remain relevant given limits on the computing power? [= assuming neither simulation nor any kind of formal proof is feasible]
I was under impression I read some relatively recent Eliezer’s text where he says the prospective FAI researchers must thoroughly understand LOGI before moving to the current even more advanced undisclosed architecture...
That sounds weird. Can you find a link?
But does it remain relevant given limits on the computing power? [= assuming neither simulation nor any kind of formal proof is feasible]
That seems to be a much stronger assumption than just limiting computing power. It can be broken by one player strategically weakening themselves, if they can benefit from being simulated.
It can be broken by one player strategically weakening themselves, if they can benefit from being simulated.
Are you sure this is possible? I tried to do this with the “impersonate other agents” strategy, but it does not seem to work if the opponent has your source code. The other agent knows you’re not actually them, just impersonating :)
There is a possibility to send out a different simple program instead of yourself (or fully self-modify into the said program, there is no difference), but it would be a wholly different problem (and easily solvable) from the original one.
Well, not quite that old, but yes, not very recent. The internet archive says the page was created at the end of 2009, but it was probably not done by EY himself. The earliest reference google gives is in 2007...
So, you’re saying, now the party line is on single-level formal system-style architectures? But does it even make sense to try to define FAI-meaningful concepts in such architecture? Isn’t it like trying to define ‘love’, ‘freedom’, and ‘justice’ in terms of atoms?
I remember EY saying somewhere (can’t find where now) that AIXI design was very commendable in the sense that here finally is a full AGI design that can be clearly shown to kill you :)
I only know what the decision theory folks are doing, don’t know about the SingInst party line.
Formally defining “love” may be easier than you think. For example, Paul Christiano’s blog has some posts about using “pointers” to our world: take a long bitstring, like the text of Finnegans Wake, and tell the AI to influence whatever algorithm was most likely to produce that string under the universal prior. Also I have played with the idea of using UDT to increase the measure of specified bitstrings. Such ideas don’t require knowing correct physics down to the level of atoms, and I can easily imagine that we may find a formal way of pointing the AI at any human-recognizable idea without going through atoms.
Thanks for the reference! I skimmed over the blog, and wow! The amount of seriously considered weirdness is staggering :) (like: acausal counterfactual takeover by a simulating UFAI!). It is of huge entertainment value, of course, but… most of it appears to be conditioned on blatantly impossible premises, so it’s hard to take the concerns seriously. Maybe it’s lack of imagination on my part...
Regarding the solution to defining complex concepts via low-level inputs, as far as I understood the idea, you do not remove the multi-leveledness, just let it be inferred internally by the AI and refuse to look at how it is done. Besides, it does not appear to solve the problem: metaphorically speaking, we are not really interested in getting the precise text (Finnegans Wake) down to its last typo, but in a probability measure over all possible texts, which is concentrated on texts that are “sufficiently similar”. In fact, we are most interested in defining this similarity, which is extremely intricate and non-trivial (it may include, for example, translations into other languages).
Your comment reminded me of a post I’ve long wanted to write. The idea is that examining assumptions is unproductive. The only way to make intellectual progress, either individually or as a group, is to stop arguing about assumptions and instead explore their implications wherever they might lead. The #1 reason why I took so long to understand Newcomb’s Problem or Counterfactual Mugging was my insistence on denying the assumptions behind these problems. Instead I should have said to myself, okay, is this direction of inquiry interesting when taken on its own terms?
Many assumptions seemed divorced from real life at first, e.g. people dismissed the study of electromagnetism as an impractical toy, and considered number theory hopelessly abstract until cryptography arrived. People seem unable to judge the usefulness of assumptions before exploring their implications in detail, but they absolutely love arguing about assumptions instead of getting anything done.
There, thanks for encouraging me to write the first draft :-)
Absolutely, I agree of course. If a line of inquiry is interesting in itself and a progress is being made, why not pursue it? My question was only about its direct relevance to FAI. Or, rather, whether the arguments that I made to myself about its non-relevance can be easily refuted.
And, you know, questioning of assumptions can sometimes be useful too. From a false assumption anything follows :)
In any case, I’m glad to be of service, however small. Your posts are generally excellent.
Well, to make a guaranteed friendly AI you probably need to prove theorems about your AI design. And our universe most likely contains many copies of everything. So figuring out the right decision theory in the presence of copies seems to be a necessary step on the road to FAI. I don’t speak for SingInst here, this is just how I feel.
Wouldn’t this be a level mismatch in a multi-level AI architecture? Like, proving things about low-level neural computational substrate instead of about the conceptual level where actual cognition would take place, and where the actual friendliness would be defined? [and this level can’t be isomorphic to any formal logical system, except in symbolic AI, which doesn’t work]
And again, a conceptual-level understanding should do the trick. Like, knowing that I play PD against myself would be sufficient to cooperate. Besides, as EY frequently says, it’s really hard to find oneself in a true PD. Usually, it’s iterated PD, or some Schelling’s conflict game [BTW, huge thanks for recommending his book in one of your posts!]
If a multilevel architecture (whatever it is) makes provable friendliness impossible, then FAI can’t use it.
I imagine the future FAI as closer to AIXI, which works fine without multilevel architecture, than to the Lisp programs of the 70s.
AFAICT, the general architecture that EY advocates (-ed?) in “Levels of Organization in GI” is multilevel. But this doesn’t automatically mean that it’s impossible to prove anything about it. Maybe it’s possible, just not using the formal logic methods. [And so maybe getting not a 100% certainty, but 100-1e-N%, which should be sufficient for large enough N].
AIXI doesn’t work so much more than symbolic AI Lisp programs of the 70s. I mean, the General Problem Solver would be superintelligent given infinite computing power.
Eliezer says here:
To make the General Problem Solver or any other powerful computing device do anything interesting in the real world, you need to give it a formal description that contains the real world as a special case. You could use the universal prior, which gives you AIXI. Or you could use the yet-unspecified prior of UDT, which gives you the yet-unspecified UDT-AIXI.
The central difficulty of decision theory doesn’t get easier if you have lots of computing power. Imagine you’re playing the PD against someone. You both know each other’s source code, but you have a halting oracle and your opponent doesn’t. With so much power, what do you do? I simulate the other guy and… whoops, didn’t think this through. Looks like I must avoid looking at the result. Hmmm.
Oh… LOGI’s totally relinquished then? They should mark the paper as completely obsolete in the list of SI publications, otherwise it’s confusing :) I was under impression I read some relatively recent Eliezer’s text where he says the prospective FAI researchers must thoroughly understand LOGI before moving to the current even more advanced undisclosed architecture...
Yes, this is an interesting problem. And it appears to produce some neat metaphors. Like: maintain illusion of free will by deliberately avoiding knowing your own decision in advance [or become crazy]. And avoid de-humanizing your opponents [or get defected].
But does it remain relevant given limits on the computing power? [= assuming neither simulation nor any kind of formal proof is feasible]
That sounds weird. Can you find a link?
That seems to be a much stronger assumption than just limiting computing power. It can be broken by one player strategically weakening themselves, if they can benefit from being simulated.
This.
Are you sure this is possible? I tried to do this with the “impersonate other agents” strategy, but it does not seem to work if the opponent has your source code. The other agent knows you’re not actually them, just impersonating :)
There is a possibility to send out a different simple program instead of yourself (or fully self-modify into the said program, there is no difference), but it would be a wholly different problem (and easily solvable) from the original one.
Ouch, that text sounds painful, it’s probably about as old as LOGI.
Well, not quite that old, but yes, not very recent. The internet archive says the page was created at the end of 2009, but it was probably not done by EY himself. The earliest reference google gives is in 2007...
So, you’re saying, now the party line is on single-level formal system-style architectures? But does it even make sense to try to define FAI-meaningful concepts in such architecture? Isn’t it like trying to define ‘love’, ‘freedom’, and ‘justice’ in terms of atoms?
I remember EY saying somewhere (can’t find where now) that AIXI design was very commendable in the sense that here finally is a full AGI design that can be clearly shown to kill you :)
Here is a 2003 reference to the original SL4 wiki post, which is still online but for some reason not indexed by Google.
I only know what the decision theory folks are doing, don’t know about the SingInst party line.
Formally defining “love” may be easier than you think. For example, Paul Christiano’s blog has some posts about using “pointers” to our world: take a long bitstring, like the text of Finnegans Wake, and tell the AI to influence whatever algorithm was most likely to produce that string under the universal prior. Also I have played with the idea of using UDT to increase the measure of specified bitstrings. Such ideas don’t require knowing correct physics down to the level of atoms, and I can easily imagine that we may find a formal way of pointing the AI at any human-recognizable idea without going through atoms.
Thanks for the reference! I skimmed over the blog, and wow! The amount of seriously considered weirdness is staggering :) (like: acausal counterfactual takeover by a simulating UFAI!). It is of huge entertainment value, of course, but… most of it appears to be conditioned on blatantly impossible premises, so it’s hard to take the concerns seriously. Maybe it’s lack of imagination on my part...
Regarding the solution to defining complex concepts via low-level inputs, as far as I understood the idea, you do not remove the multi-leveledness, just let it be inferred internally by the AI and refuse to look at how it is done. Besides, it does not appear to solve the problem: metaphorically speaking, we are not really interested in getting the precise text (Finnegans Wake) down to its last typo, but in a probability measure over all possible texts, which is concentrated on texts that are “sufficiently similar”. In fact, we are most interested in defining this similarity, which is extremely intricate and non-trivial (it may include, for example, translations into other languages).
Your comment reminded me of a post I’ve long wanted to write. The idea is that examining assumptions is unproductive. The only way to make intellectual progress, either individually or as a group, is to stop arguing about assumptions and instead explore their implications wherever they might lead. The #1 reason why I took so long to understand Newcomb’s Problem or Counterfactual Mugging was my insistence on denying the assumptions behind these problems. Instead I should have said to myself, okay, is this direction of inquiry interesting when taken on its own terms?
Many assumptions seemed divorced from real life at first, e.g. people dismissed the study of electromagnetism as an impractical toy, and considered number theory hopelessly abstract until cryptography arrived. People seem unable to judge the usefulness of assumptions before exploring their implications in detail, but they absolutely love arguing about assumptions instead of getting anything done.
There, thanks for encouraging me to write the first draft :-)
Absolutely, I agree of course. If a line of inquiry is interesting in itself and a progress is being made, why not pursue it? My question was only about its direct relevance to FAI. Or, rather, whether the arguments that I made to myself about its non-relevance can be easily refuted.
And, you know, questioning of assumptions can sometimes be useful too. From a false assumption anything follows :)
In any case, I’m glad to be of service, however small. Your posts are generally excellent.
Interesting. Do you see any current arguments over assumptions that we should stop (on LW or elsewhere)?
Hmm, looks like I sometimes attack people for starting from (what I consider) wrong assumptions. Maybe I should rethink my position on those issues.