I just think the terminology is a bit confusing, since many people use “utility maximizer” in the informal sense of “a gadget that tries to get as much utility as it can” rather than “something that finds the absolutely optimal strategy”.
This is a common sentiment, but note that a meliorizer is far from a maximizer even in the non-optimal sense. A meliorizer doesn’t do any intelligent navigation of the search space, it just takes the first thing it can find which is better than its fallback plan.
The quantifier is in the right place.
I’m confused about the difference between Tκ(α) |- φ in (2) and @_Tκ(α) "φ": don’t they both mean “Tκ(α) proves φ”?
Side note: to get underscores in comments, escape them like\_this.
In addition, it has access to a special “oracle”
I’m also confused about how this oracle cashes out. The parent agent needs to prove k > n in your formulation; what does this proof look like? If the parent agent has access to an axiom “forall n. k > n” then the system is inconsistent. If, instead, the oracle cashes out as a formula psi such that the parent can always prove psi(n) but cannot prove “forall n. psi(n)”, then you’ve re-invented Marcello’s waterfall :-)
I suspect that you’re trying to say something else, but I’m not sure what it is. What specifically does it mean for a theory of first-order logic to have a “black box” that “introduces assumptions of the form k > n”? I suspect that if it doesn’t cash out to something equivalent to the waterfall then there’s going to be some trouble with the parent agent trusting that its child agent has access to an accurate black box.
I will be very grateful for any advice how to get people in MIRI interested.
PDFs and concise statements of the benefits (“look, I think I found a way to get a tiling agent that does X”) would make it easier for us to parse. Benja and I are both quite busy right now (there’s a workshop on), but I’m sure we can get in touch more directly sometime in the coming weeks.
This is a common sentiment, but note that a meliorizer is far from a maximizer even in the non-optimal sense. A meliorizer doesn’t do any intelligent navigation of the search space, it just takes the first thing it can find which is better than its fallback plan.
Fair enough: I amended to the text to be (hopefully) less confusing.
I’m confused about the difference between Tκ(α) |- φ in (2) and @_Tκ(α) “φ”: don’t they both mean “Tκ(α) proves φ”?
The difference is mainly notational: the turnstile is used instead of the box “@” since this is supposed to be a statement in the meta-theory we use to discuss the formalism rather than in the theory used by the agent itself. This is the way I understood Yudkowsky and Herreschoff using that notation (see equation 2.1 for example) and I tried to stick to it. But I might be confused about something.
I’m also confused about how this oracle cashes out.
The agent is a program that interacts with the oracle. Suppose n is a number for which the oracle has affirmatively answered “alpha_kappa > alpha_n?” by the given point in the computation. Suppose further that the program has (independently) produced a formal Tκ(α)-proof of a theorem of the form “g(n,κ)=1 → (A0()=b0 → G)”. Then the agent is licensed to perform action b0, by the construction of the agent. It doesn’t literally “prove” the premise “g(n,κ)=1″ at any point.
I suspect that if it doesn’t cash out to something equivalent to the waterfall then there’s going to be some trouble with the parent agent trusting that its child agent has access to an accurate black box.
There is no such problem, at least in the simple “A^alpha”-like formalism. Note that the goal sentence G is typically defined in terms of the sequence of actions a_m which is defined recursively by a_{m+1} = A_m(), a_m = (a_m, A_m). Here “B()” is a formal expression that depends on kappa since by definition it is the result of the computation by program B interacting with a valid “kappa-oracle”.
In the more general formalism of updateless intelligence, there is no problem as long as self-modification amounts to reprogramming of the same hardware with the same “box”. Validation of a completely new “box” is more difficult, but possible at least to some finite level of confidence: see discussion in the end of the previous post. The latter problem is something I don’t claim to have solved to my complete satisfaction.
PDFs and concise statements of the benefits (“look, I think I found a way to get a tiling agent that does X”) would make it easier for us to parse.
I apologize for asking stupid questions but I neither worked in the academia nor even defended a thesis which makes me unacquainted with the standard tools. Is it important to make a PDF or using Latex is enough? If it is the former, what is the recommended freeware way of producing PDFs under Windows?
… I’m sure we can get in touch more directly sometime in the coming weeks.
That would be great! Again, thx for taking the time to read my work and discuss it!
I recommend writelatex for writing latex; it will make it quite easy to render a pdf.
I already rendered your argument (for the non-ordinal version) in latex, both to make it easier for me to understand and so you can see the commands that are commonly used to render stuff like this. You can read it here, I’ve PM’d you an edit link.
this is supposed to be a statement in the meta-theory
Ah, I see. I misunderstood. You’re saying that it has an infinite action condition
k > 0 -> (do(a) -> G)
or k > 1 -> (do(a) -> G)
or k > 2 -> (do(a) -> G)
which in the meta-language we can express as (2).
I’m pretty sure that this still reduces to vanilla PP. Consider: your proof goes through if the action condition is simply k > 0 -> (do(a) -> G). The child can trivially prove k + 1 > 0, so the parent is demanding only that the child prove G[k\k+1] which is fine, except the parent still doesn’t trust that the k’th child is sound. An extended version of this argument can be found in the paper linked above.
(If I’ve misinterpreted, please correct me again. If I’m correct, please don’t be discouraged—it’s great to see people working on new variations.)
Thx a dozen for writing that latex and recommending writelatex!
Indeed, in the formalism from the previous post (which you very kindly latexed) the agent thinks the tiling is going to effectively end after kappa steps (i.e. that the following agents won’t operate by looking for arbitrary proofs of theorems in a theory of the same power). This is exactly the problem the current post sets out to solve. By working with ordinals instead of natural numbers, the agent can prove the tiling is infinite by verifying that alpha_kappa >= omega.
Moreover, even the formalism in the previous post does not reduce to vanilla PP. This is because, as opposed to what you wrote in the document, the goal G doesn’t say something like “the world is safe for kappa steps”. Indeed, the proof of tiling doesn’t assume anything about G! For example, assuming that the motor action a_m is a natural number, G can be “at step m, the motor action must equal m^2″. There is no kappa-time-limit in sight: kappa only enters G through the oracle’s role in the agents’ computations. This is important when passing from goals to expected utilities since I want the agent to be able to prove non-trivial bounds on the expected utility which would be impossible if the utility only depended on time interval [0, kappa] without anything known about kappa.
Indeed, the proof of tiling doesn’t assume anything about G!
Ah, sorry, I assumed we were using the original PP assumptions. In that case, I’m guessing that such an agent is vulnerable to the procrastination paradox. Unfortunately, I don’t have any more time to toy with this today.
I’d love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form “there exists a time at which the button becomes pressed” where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is “press the button”. Could you provide one? I’m still a bit unclear about how the system works, and I think the walkthrough would help.
OK, firstly I found an error in what I wrote, namely condition (3) was missing a piece which I erroneously thought was redundant. Fixed it in both posts.
In that case, I’m guessing that such an agent is vulnerable to the procrastination paradox.
It surely is. However, the problem goes away once we restrict to goals G such that given a time sequence {a_m} that doesn’t satisfy G, there is an initial segment {a_m}_{m<N} s.t. any sequence beginning with this initial segment also doesn’t satisfy G. For example, assume the “motor action space” consists of natural numbers and the goal is as follows: either a_m = m^2+1 for all m or a_m = 0 for the first m violating the previous equation. 0 serves us as the “fallback action”. The agent is guaranteed to achieve any goal of this form.
I’d love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form “there exists a time at which the button becomes pressed” where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is “press the button”. Could you provide one? I’m still a bit unclear about how the system works, and I think the walkthrough would help.
I’m guessing that the unclear part is how are conditions (3) and (4) satisfied. I’ll use the goal I described in the previous paragraph to avoid the procrastination paradox (otherwise the tiling still happens but the goal isn’t guaranteed to be really achieved). Let’s start with the “finite” system.
The sequence Am is recursively defined within T_κ by A_m() = a_m = (a_m, A_{m+1}). Here a_m are the motor actions which are used to define G.
Condition (4) is T_κ | (A0() = b_0 and B1() = 0) → G. Because of the way fallback works with this goal, we need to introduce the further assumption that b_0 = 0^2+1 = 1 where b_0 = (b_0, B0). The T_κ-proof of (4) now goes as follows. A0() = b_0 implies A1 = B1. B1() = 0 implies A1() = 0 implies a_1 = 0. A0() = b_0 also implies a_0 = b_0 = 1. a_0 = 1 and a_1 = 0 imply G by definition of G.
Condition (3) is T_κ | For all b_1: (B1() = b_1 → (b_1 = 0 or Exists n: (κ > n and @_{T_κ} “(κ+1 > n and B1()[κ\κ+1] = b_1) → G[κ\κ+1]”))). This condition is T_κ-proved by inspecting B1′s code. B1 looks for T_κ-proofs of theorems of the form “κ > n-1 and B1()[κ\κ+1] = b_1 → G[κ\κ+1]”. It is licensed to act on such a proof only when the oracle affirms the stronger premise κ > n. Note that G[κ\κ+1] looks the same as G except that the oracle used in evaluating Am() is “twisted” (in a way that doesn’t affect the secret real implementation of the oracle). B1 performs fallback action 0 when it fails to find any proof as above. In practice, if a_0=(0^2+1=1,B1) then B1 is able to prove it (for example by running A0; more precisely since its goal is twisted it has to run a twisted A0 but it doesn’t change the result). It can then go on to prove that a suitable motor action and a suitable successor agent will lead to the goal (since B1 and G are twisted in the same way within the theorem). If B1 is an “L-zombie” i.e. it wasn’t really constructed by A0 then it a priori might fail to find a suitable proof in which case it will fallback.
I’ll stop at this point. If this clarification was insufficient, please point me in the direction of the problem. If this clarification was sufficient for the “finite” system but the analogical argument for the ordinal system is unclear, tell me and I will produce it.
The ordinal formalism turned out to be erroneous (see remark I added in beginning of post). Regarding the oracle version of “usual” parametric polymorphism, it is in a way closer to Marcello’s waterfall than to the original parametric polymorphism: it doesn’t “lose proof strength” but it is vulnerable to the procrastination paradox. On the other hand I don’t think the procrastination paradox is a severe problem, it only means you need to require your utility function to be semicontinuous wrt the product topology of the set of possible universes, where the product is over time slices. Also, the oracle formalism has an advantage over Marcello’s waterfall in that it doesn’t rely on an arbitrary choice of unprovable formula.
Sidenote: Is a workshop planned sometime soon? It would be very interesting for me to attend and exchange ideas with other people.
Benja and I have been pretty busy writing papers and attending some non-MIRI workshops. Eventually, we’ll get back around to feeling out this system, but I have not had the time to go back through your posts in depth so as to understand the motivation. It will take me some time and effort to give your system the attention it deserves, and I still haven’t been able to find that time :-)
There aren’t any workshops coming up in the immediate future, but did you hear about MIRIx workshops? If you set up a your own local workshop, MIRI can reimburse certain expenses and may even send a researcher to the first workshop to give tutorials and/or chat, depending on the situation.
This is a common sentiment, but note that a meliorizer is far from a maximizer even in the non-optimal sense. A meliorizer doesn’t do any intelligent navigation of the search space, it just takes the first thing it can find which is better than its fallback plan.
I’m confused about the difference between
Tκ(α) |- φ
in (2) and@_Tκ(α) "φ"
: don’t they both mean “Tκ(α) proves φ”?Side note: to get underscores in comments, escape them
like\_this
.I’m also confused about how this oracle cashes out. The parent agent needs to prove k > n in your formulation; what does this proof look like? If the parent agent has access to an axiom “forall n. k > n” then the system is inconsistent. If, instead, the oracle cashes out as a formula psi such that the parent can always prove psi(n) but cannot prove “forall n. psi(n)”, then you’ve re-invented Marcello’s waterfall :-)
I suspect that you’re trying to say something else, but I’m not sure what it is. What specifically does it mean for a theory of first-order logic to have a “black box” that “introduces assumptions of the form k > n”? I suspect that if it doesn’t cash out to something equivalent to the waterfall then there’s going to be some trouble with the parent agent trusting that its child agent has access to an accurate black box.
PDFs and concise statements of the benefits (“look, I think I found a way to get a tiling agent that does X”) would make it easier for us to parse. Benja and I are both quite busy right now (there’s a workshop on), but I’m sure we can get in touch more directly sometime in the coming weeks.
Again, I’m excited by your progress!
Fair enough: I amended to the text to be (hopefully) less confusing.
The difference is mainly notational: the turnstile is used instead of the box “@” since this is supposed to be a statement in the meta-theory we use to discuss the formalism rather than in the theory used by the agent itself. This is the way I understood Yudkowsky and Herreschoff using that notation (see equation 2.1 for example) and I tried to stick to it. But I might be confused about something.
The agent is a program that interacts with the oracle. Suppose n is a number for which the oracle has affirmatively answered “alpha_kappa > alpha_n?” by the given point in the computation. Suppose further that the program has (independently) produced a formal Tκ(α)-proof of a theorem of the form “g(n,κ)=1 → (A0()=b0 → G)”. Then the agent is licensed to perform action b0, by the construction of the agent. It doesn’t literally “prove” the premise “g(n,κ)=1″ at any point.
There is no such problem, at least in the simple “A^alpha”-like formalism. Note that the goal sentence G is typically defined in terms of the sequence of actions a_m which is defined recursively by a_{m+1} = A_m(), a_m = (a_m, A_m). Here “B()” is a formal expression that depends on kappa since by definition it is the result of the computation by program B interacting with a valid “kappa-oracle”.
In the more general formalism of updateless intelligence, there is no problem as long as self-modification amounts to reprogramming of the same hardware with the same “box”. Validation of a completely new “box” is more difficult, but possible at least to some finite level of confidence: see discussion in the end of the previous post. The latter problem is something I don’t claim to have solved to my complete satisfaction.
I apologize for asking stupid questions but I neither worked in the academia nor even defended a thesis which makes me unacquainted with the standard tools. Is it important to make a PDF or using Latex is enough? If it is the former, what is the recommended freeware way of producing PDFs under Windows?
That would be great! Again, thx for taking the time to read my work and discuss it!
I recommend writelatex for writing latex; it will make it quite easy to render a pdf.
I already rendered your argument (for the non-ordinal version) in latex, both to make it easier for me to understand and so you can see the commands that are commonly used to render stuff like this. You can read it here, I’ve PM’d you an edit link.
Ah, I see. I misunderstood. You’re saying that it has an infinite action condition
which in the meta-language we can express as (2).
I’m pretty sure that this still reduces to vanilla PP. Consider: your proof goes through if the action condition is simply
k > 0 -> (do(a) -> G)
. The child can trivially provek + 1 > 0
, so the parent is demanding only that the child proveG[k\k+1]
which is fine, except the parent still doesn’t trust that the k’th child is sound. An extended version of this argument can be found in the paper linked above.(If I’ve misinterpreted, please correct me again. If I’m correct, please don’t be discouraged—it’s great to see people working on new variations.)
Thx a dozen for writing that latex and recommending writelatex!
Indeed, in the formalism from the previous post (which you very kindly latexed) the agent thinks the tiling is going to effectively end after kappa steps (i.e. that the following agents won’t operate by looking for arbitrary proofs of theorems in a theory of the same power). This is exactly the problem the current post sets out to solve. By working with ordinals instead of natural numbers, the agent can prove the tiling is infinite by verifying that alpha_kappa >= omega.
Moreover, even the formalism in the previous post does not reduce to vanilla PP. This is because, as opposed to what you wrote in the document, the goal G doesn’t say something like “the world is safe for kappa steps”. Indeed, the proof of tiling doesn’t assume anything about G! For example, assuming that the motor action a_m is a natural number, G can be “at step m, the motor action must equal m^2″. There is no kappa-time-limit in sight: kappa only enters G through the oracle’s role in the agents’ computations. This is important when passing from goals to expected utilities since I want the agent to be able to prove non-trivial bounds on the expected utility which would be impossible if the utility only depended on time interval [0, kappa] without anything known about kappa.
Ah, sorry, I assumed we were using the original PP assumptions. In that case, I’m guessing that such an agent is vulnerable to the procrastination paradox. Unfortunately, I don’t have any more time to toy with this today.
I’d love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form “there exists a time at which the button becomes pressed” where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is “press the button”. Could you provide one? I’m still a bit unclear about how the system works, and I think the walkthrough would help.
OK, firstly I found an error in what I wrote, namely condition (3) was missing a piece which I erroneously thought was redundant. Fixed it in both posts.
It surely is. However, the problem goes away once we restrict to goals G such that given a time sequence {a_m} that doesn’t satisfy G, there is an initial segment {a_m}_{m<N} s.t. any sequence beginning with this initial segment also doesn’t satisfy G. For example, assume the “motor action space” consists of natural numbers and the goal is as follows: either a_m = m^2+1 for all m or a_m = 0 for the first m violating the previous equation. 0 serves us as the “fallback action”. The agent is guaranteed to achieve any goal of this form.
I’m guessing that the unclear part is how are conditions (3) and (4) satisfied. I’ll use the goal I described in the previous paragraph to avoid the procrastination paradox (otherwise the tiling still happens but the goal isn’t guaranteed to be really achieved). Let’s start with the “finite” system.
The sequence Am is recursively defined within T_κ by A_m() = a_m = (a_m, A_{m+1}). Here a_m are the motor actions which are used to define G.
Condition (4) is T_κ | (A0() = b_0 and B1() = 0) → G. Because of the way fallback works with this goal, we need to introduce the further assumption that b_0 = 0^2+1 = 1 where b_0 = (b_0, B0). The T_κ-proof of (4) now goes as follows. A0() = b_0 implies A1 = B1. B1() = 0 implies A1() = 0 implies a_1 = 0. A0() = b_0 also implies a_0 = b_0 = 1. a_0 = 1 and a_1 = 0 imply G by definition of G.
Condition (3) is T_κ | For all b_1: (B1() = b_1 → (b_1 = 0 or Exists n: (κ > n and @_{T_κ} “(κ+1 > n and B1()[κ\κ+1] = b_1) → G[κ\κ+1]”))). This condition is T_κ-proved by inspecting B1′s code. B1 looks for T_κ-proofs of theorems of the form “κ > n-1 and B1()[κ\κ+1] = b_1 → G[κ\κ+1]”. It is licensed to act on such a proof only when the oracle affirms the stronger premise κ > n. Note that G[κ\κ+1] looks the same as G except that the oracle used in evaluating Am() is “twisted” (in a way that doesn’t affect the secret real implementation of the oracle). B1 performs fallback action 0 when it fails to find any proof as above. In practice, if a_0=(0^2+1=1,B1) then B1 is able to prove it (for example by running A0; more precisely since its goal is twisted it has to run a twisted A0 but it doesn’t change the result). It can then go on to prove that a suitable motor action and a suitable successor agent will lead to the goal (since B1 and G are twisted in the same way within the theorem). If B1 is an “L-zombie” i.e. it wasn’t really constructed by A0 then it a priori might fail to find a suitable proof in which case it will fallback.
I’ll stop at this point. If this clarification was insufficient, please point me in the direction of the problem. If this clarification was sufficient for the “finite” system but the analogical argument for the ordinal system is unclear, tell me and I will produce it.
Hi, haven’t heard from you for a while.
The ordinal formalism turned out to be erroneous (see remark I added in beginning of post). Regarding the oracle version of “usual” parametric polymorphism, it is in a way closer to Marcello’s waterfall than to the original parametric polymorphism: it doesn’t “lose proof strength” but it is vulnerable to the procrastination paradox. On the other hand I don’t think the procrastination paradox is a severe problem, it only means you need to require your utility function to be semicontinuous wrt the product topology of the set of possible universes, where the product is over time slices. Also, the oracle formalism has an advantage over Marcello’s waterfall in that it doesn’t rely on an arbitrary choice of unprovable formula.
Sidenote: Is a workshop planned sometime soon? It would be very interesting for me to attend and exchange ideas with other people.
Benja and I have been pretty busy writing papers and attending some non-MIRI workshops. Eventually, we’ll get back around to feeling out this system, but I have not had the time to go back through your posts in depth so as to understand the motivation. It will take me some time and effort to give your system the attention it deserves, and I still haven’t been able to find that time :-)
There aren’t any workshops coming up in the immediate future, but did you hear about MIRIx workshops? If you set up a your own local workshop, MIRI can reimburse certain expenses and may even send a researcher to the first workshop to give tutorials and/or chat, depending on the situation.