Indeed, the proof of tiling doesn’t assume anything about G!
Ah, sorry, I assumed we were using the original PP assumptions. In that case, I’m guessing that such an agent is vulnerable to the procrastination paradox. Unfortunately, I don’t have any more time to toy with this today.
I’d love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form “there exists a time at which the button becomes pressed” where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is “press the button”. Could you provide one? I’m still a bit unclear about how the system works, and I think the walkthrough would help.
OK, firstly I found an error in what I wrote, namely condition (3) was missing a piece which I erroneously thought was redundant. Fixed it in both posts.
In that case, I’m guessing that such an agent is vulnerable to the procrastination paradox.
It surely is. However, the problem goes away once we restrict to goals G such that given a time sequence {a_m} that doesn’t satisfy G, there is an initial segment {a_m}_{m<N} s.t. any sequence beginning with this initial segment also doesn’t satisfy G. For example, assume the “motor action space” consists of natural numbers and the goal is as follows: either a_m = m^2+1 for all m or a_m = 0 for the first m violating the previous equation. 0 serves us as the “fallback action”. The agent is guaranteed to achieve any goal of this form.
I’d love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form “there exists a time at which the button becomes pressed” where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is “press the button”. Could you provide one? I’m still a bit unclear about how the system works, and I think the walkthrough would help.
I’m guessing that the unclear part is how are conditions (3) and (4) satisfied. I’ll use the goal I described in the previous paragraph to avoid the procrastination paradox (otherwise the tiling still happens but the goal isn’t guaranteed to be really achieved). Let’s start with the “finite” system.
The sequence Am is recursively defined within T_κ by A_m() = a_m = (a_m, A_{m+1}). Here a_m are the motor actions which are used to define G.
Condition (4) is T_κ | (A0() = b_0 and B1() = 0) → G. Because of the way fallback works with this goal, we need to introduce the further assumption that b_0 = 0^2+1 = 1 where b_0 = (b_0, B0). The T_κ-proof of (4) now goes as follows. A0() = b_0 implies A1 = B1. B1() = 0 implies A1() = 0 implies a_1 = 0. A0() = b_0 also implies a_0 = b_0 = 1. a_0 = 1 and a_1 = 0 imply G by definition of G.
Condition (3) is T_κ | For all b_1: (B1() = b_1 → (b_1 = 0 or Exists n: (κ > n and @_{T_κ} “(κ+1 > n and B1()[κ\κ+1] = b_1) → G[κ\κ+1]”))). This condition is T_κ-proved by inspecting B1′s code. B1 looks for T_κ-proofs of theorems of the form “κ > n-1 and B1()[κ\κ+1] = b_1 → G[κ\κ+1]”. It is licensed to act on such a proof only when the oracle affirms the stronger premise κ > n. Note that G[κ\κ+1] looks the same as G except that the oracle used in evaluating Am() is “twisted” (in a way that doesn’t affect the secret real implementation of the oracle). B1 performs fallback action 0 when it fails to find any proof as above. In practice, if a_0=(0^2+1=1,B1) then B1 is able to prove it (for example by running A0; more precisely since its goal is twisted it has to run a twisted A0 but it doesn’t change the result). It can then go on to prove that a suitable motor action and a suitable successor agent will lead to the goal (since B1 and G are twisted in the same way within the theorem). If B1 is an “L-zombie” i.e. it wasn’t really constructed by A0 then it a priori might fail to find a suitable proof in which case it will fallback.
I’ll stop at this point. If this clarification was insufficient, please point me in the direction of the problem. If this clarification was sufficient for the “finite” system but the analogical argument for the ordinal system is unclear, tell me and I will produce it.
The ordinal formalism turned out to be erroneous (see remark I added in beginning of post). Regarding the oracle version of “usual” parametric polymorphism, it is in a way closer to Marcello’s waterfall than to the original parametric polymorphism: it doesn’t “lose proof strength” but it is vulnerable to the procrastination paradox. On the other hand I don’t think the procrastination paradox is a severe problem, it only means you need to require your utility function to be semicontinuous wrt the product topology of the set of possible universes, where the product is over time slices. Also, the oracle formalism has an advantage over Marcello’s waterfall in that it doesn’t rely on an arbitrary choice of unprovable formula.
Sidenote: Is a workshop planned sometime soon? It would be very interesting for me to attend and exchange ideas with other people.
Benja and I have been pretty busy writing papers and attending some non-MIRI workshops. Eventually, we’ll get back around to feeling out this system, but I have not had the time to go back through your posts in depth so as to understand the motivation. It will take me some time and effort to give your system the attention it deserves, and I still haven’t been able to find that time :-)
There aren’t any workshops coming up in the immediate future, but did you hear about MIRIx workshops? If you set up a your own local workshop, MIRI can reimburse certain expenses and may even send a researcher to the first workshop to give tutorials and/or chat, depending on the situation.
Ah, sorry, I assumed we were using the original PP assumptions. In that case, I’m guessing that such an agent is vulnerable to the procrastination paradox. Unfortunately, I don’t have any more time to toy with this today.
I’d love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form “there exists a time at which the button becomes pressed” where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is “press the button”. Could you provide one? I’m still a bit unclear about how the system works, and I think the walkthrough would help.
OK, firstly I found an error in what I wrote, namely condition (3) was missing a piece which I erroneously thought was redundant. Fixed it in both posts.
It surely is. However, the problem goes away once we restrict to goals G such that given a time sequence {a_m} that doesn’t satisfy G, there is an initial segment {a_m}_{m<N} s.t. any sequence beginning with this initial segment also doesn’t satisfy G. For example, assume the “motor action space” consists of natural numbers and the goal is as follows: either a_m = m^2+1 for all m or a_m = 0 for the first m violating the previous equation. 0 serves us as the “fallback action”. The agent is guaranteed to achieve any goal of this form.
I’m guessing that the unclear part is how are conditions (3) and (4) satisfied. I’ll use the goal I described in the previous paragraph to avoid the procrastination paradox (otherwise the tiling still happens but the goal isn’t guaranteed to be really achieved). Let’s start with the “finite” system.
The sequence Am is recursively defined within T_κ by A_m() = a_m = (a_m, A_{m+1}). Here a_m are the motor actions which are used to define G.
Condition (4) is T_κ | (A0() = b_0 and B1() = 0) → G. Because of the way fallback works with this goal, we need to introduce the further assumption that b_0 = 0^2+1 = 1 where b_0 = (b_0, B0). The T_κ-proof of (4) now goes as follows. A0() = b_0 implies A1 = B1. B1() = 0 implies A1() = 0 implies a_1 = 0. A0() = b_0 also implies a_0 = b_0 = 1. a_0 = 1 and a_1 = 0 imply G by definition of G.
Condition (3) is T_κ | For all b_1: (B1() = b_1 → (b_1 = 0 or Exists n: (κ > n and @_{T_κ} “(κ+1 > n and B1()[κ\κ+1] = b_1) → G[κ\κ+1]”))). This condition is T_κ-proved by inspecting B1′s code. B1 looks for T_κ-proofs of theorems of the form “κ > n-1 and B1()[κ\κ+1] = b_1 → G[κ\κ+1]”. It is licensed to act on such a proof only when the oracle affirms the stronger premise κ > n. Note that G[κ\κ+1] looks the same as G except that the oracle used in evaluating Am() is “twisted” (in a way that doesn’t affect the secret real implementation of the oracle). B1 performs fallback action 0 when it fails to find any proof as above. In practice, if a_0=(0^2+1=1,B1) then B1 is able to prove it (for example by running A0; more precisely since its goal is twisted it has to run a twisted A0 but it doesn’t change the result). It can then go on to prove that a suitable motor action and a suitable successor agent will lead to the goal (since B1 and G are twisted in the same way within the theorem). If B1 is an “L-zombie” i.e. it wasn’t really constructed by A0 then it a priori might fail to find a suitable proof in which case it will fallback.
I’ll stop at this point. If this clarification was insufficient, please point me in the direction of the problem. If this clarification was sufficient for the “finite” system but the analogical argument for the ordinal system is unclear, tell me and I will produce it.
Hi, haven’t heard from you for a while.
The ordinal formalism turned out to be erroneous (see remark I added in beginning of post). Regarding the oracle version of “usual” parametric polymorphism, it is in a way closer to Marcello’s waterfall than to the original parametric polymorphism: it doesn’t “lose proof strength” but it is vulnerable to the procrastination paradox. On the other hand I don’t think the procrastination paradox is a severe problem, it only means you need to require your utility function to be semicontinuous wrt the product topology of the set of possible universes, where the product is over time slices. Also, the oracle formalism has an advantage over Marcello’s waterfall in that it doesn’t rely on an arbitrary choice of unprovable formula.
Sidenote: Is a workshop planned sometime soon? It would be very interesting for me to attend and exchange ideas with other people.
Benja and I have been pretty busy writing papers and attending some non-MIRI workshops. Eventually, we’ll get back around to feeling out this system, but I have not had the time to go back through your posts in depth so as to understand the motivation. It will take me some time and effort to give your system the attention it deserves, and I still haven’t been able to find that time :-)
There aren’t any workshops coming up in the immediate future, but did you hear about MIRIx workshops? If you set up a your own local workshop, MIRI can reimburse certain expenses and may even send a researcher to the first workshop to give tutorials and/or chat, depending on the situation.