I also question the importance of working on this problem now, but for a somewhat different reason.
Part of where I’m coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another
My understanding is that Lobian issues make it impossible for a proof-based AI to decide to not immediately commit suicide, because it can’t prove that it won’t do something worse than nothing in the future. (Let’s say it will have the option to blow up Earth in the future. Since it can’t prove that its own proof system is consistent, it can’t prove that it won’t prove that blowing up Earth maximizes utility at that future time.) To me this problem looks more like a problem with making decisions based purely on proofs, and not much related to self-modification.
Using probabilities instead of proofs seems to eliminate the old obstructions, but it does leave a sequence of challenging problems (hence the work on probabilistic reflection). E.g., we’ve proved that there is an algorithm P using a halting oracle such that:
(Property R): Intuitively, we “almost” have a < P(X | a < P(X) < b) < b. Formally:
For each sentence X, each a, and each b, P(X AND a<P(X)<b ) < b * P(a ⇐ P(X) ⇐ b).
For each sentence X, each a, and each b, P(X AND a<=P(X)<=b) > a * P(a < P(X) < b)
But this took a great deal of work, and we can’t exhibit any algorithm that simultaneously satisfies Property R and has P(Property R) = 1. Do you think this is not an important question? It seems to me that we don’t yet know how many of the Godelian obstructions carry in the probabilistic environment, and there are still real problems that will involve ingenuity to resolve.
Putting the dangers of AI progress aside, we probably ought to first work on understanding logical uncertainty in general, and start with simpler problems. I find it unlikely that we can solve “probabilistic reflection” (or even correctly specify what the problem is) when we don’t yet know what principles allow us to say that P!=NP is more likely to be true than false. Do we even know that using probabilities is the right way to handle logical uncertainty? (People assumed that using probabilities is the right way to handle indexical uncertainty and that turned out to be wrong.)
we don’t yet know what principles allow us to say that P!=NP is more likely to be true than false
We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn’t get into priors). MIRI is working much more directly on this problem as well. Can you think of concrete open questions in that space? Basically we are just trying to develop the theory, but having simple concrete problems would surely be good. (I have a bucket of standard toy problems to resolve, and don’t have a good approach that handle all of them, but it’s pretty easy to hack together a solution to them so they don’t really count as open problems.)
I agree that AI progress is probably socially costly (highly positive for currently living folks, modestly negative for the average far future person). I think work with a theoretical bias is more likely to be helpful, and I don’t think it is very bad on net. Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.
We don’t know that probabilities are the right way to handle logical uncertainty, nor that our problem statements are correct. I think that the kind of probabilistic reflection we are working on is fairly natural though.
I agree with both you and Nick that the strategic questions are very important, probably more important than the math. I don’t think that is inconsistent with getting the mathematical research program up and going. I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern (by 1. making it clear that there are concrete policy-relevant questions here, and 2. building status and credibility for safety-concerned communities and individuals), but even neglecting that I think it would still be worth it.
We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn’t get into priors).
I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”. It seems like we’re still at a stage when we don’t understand logical uncertainty at a deep level and can offer solutions based on fundamental principles, but just trying out various ideas to see what sticks.
I agree that AI progress is probably socially costly [...] Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.
I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?
I think that the kind of probabilistic reflection we are working on is fairly natural though.
Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”.
This is basically the same as the situation with respect to indexical probabilities. There are dominance arguments for betting odds etc. that don’t quite go through, but it seems like probabilities are still distinguished as a good best guess, and worth fleshing out. And if you accept probabilities prior specification is the clear next question.
I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?
I think it’s plausible there are net social costs, excluding reputational impacts, and would certainly prefer to think more about it first. But with reputational impacts it seems like the case is relatively clear (of course this is potentially self-serving reasoning), and there are similar gains in terms of making things seem more concrete etc.
Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality), and the second claim was that the epsilons aren’t so bad (e.g. because exact comparisons between floats are kind of silly anyway). Probably those could be more explicit in the writeup, but it would be helpful to know which steps seem shakiest.
Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality)
Why do you say “exactly corresponded to reality”? You’d have an inner symbol which corresponded to the outer P, but P must be more like subjective credence than external reality, since in reality each logical statement is presumably either true or false, not a probabilistic mixture of both?
Intuitively, what I’d want is a “math intuition module” which, if it was looking at a mathematical expression denoting the beliefs that a copy of itself would have after running for a longer period of time or having more memory, would assign high probability that those beliefs would better correspond to reality than its own current beliefs. This would in turn license the AI using this MIM to build a more powerful version of itself, or just to believe that “think more” is generally a good idea aside from opportunity costs. I understand that you are not trying to directly build such an MIM, just to do a possibility proof. But your formalism looks very different from my intuitive requirement, and I don’t understand what your intuitive requirement might be.
P is intended to be like objective reality, exactly analogously with the predicate “True.” So we can adjoin P as a symbol and the reflection principle as an axiom schema, and thereby obtain a more expressive language. Depending on architecture, this also may increase the agent’s ability to formulate or reason about hypotheses.
Statements without P’s in them, are indeed either true or false with probability 1. I agree it is a bit odd for statements with P in them to have probabilities, but I don’t see a strong argument it shouldn’t happen. In particular, it seems irrelevant to anything meaningful we would like to do with a truth predicate. In subsequent versions of this result, the probabilities have been removed and the core topological considerations exposed directly.
The relationship between a truth predicate and the kind of reasoning you discuss (a MIM that believes its own computations are trustworthy) is that truth is useful or perhaps necessary for defining the kind of correspondence that you want the MIM to accept, about a general relationship between the algorithm it is running and what is “true”. So having a notion of “truth” seems like the first step.
I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern
Also, by attracting thinkers who can initially only be attracted by crisp technical problems, but as they get involved, will turn their substantial brainpower toward the strategic questions as well.
For three additional reasons for MIRI to focus on math for now, see the bullet points under “strategic research will consume a minority of our research budget in 2013” in MIRI’s Strategy for 2013.
what principles allow us to say that P!=NP is more likely to be true than false
Maybe we use the same principle that allows me to say “I guess I left my wallet at home” after I fail to find the wallet in the most likely places it could be, like my pockets. In other words, maybe we do Bayesian updating about the location of the “true” proof or disproof, as we check some apriori likely locations (attempted proofs and disproofs) and fail to find it there. This idea is still very vague, but looks promising to me because it doesn’t assume logical omniscience, unlike Abram’s and Benja’s ideas...
I also question the importance of working on this problem now, but for a somewhat different reason.
My understanding is that Lobian issues make it impossible for a proof-based AI to decide to not immediately commit suicide, because it can’t prove that it won’t do something worse than nothing in the future. (Let’s say it will have the option to blow up Earth in the future. Since it can’t prove that its own proof system is consistent, it can’t prove that it won’t prove that blowing up Earth maximizes utility at that future time.) To me this problem looks more like a problem with making decisions based purely on proofs, and not much related to self-modification.
Using probabilities instead of proofs seems to eliminate the old obstructions, but it does leave a sequence of challenging problems (hence the work on probabilistic reflection). E.g., we’ve proved that there is an algorithm P using a halting oracle such that:
(Property R): Intuitively, we “almost” have a < P(X | a < P(X) < b) < b. Formally:
For each sentence X, each a, and each b, P(X AND a<P(X)<b ) < b * P(a ⇐ P(X) ⇐ b).
For each sentence X, each a, and each b, P(X AND a<=P(X)<=b) > a * P(a < P(X) < b)
But this took a great deal of work, and we can’t exhibit any algorithm that simultaneously satisfies Property R and has P(Property R) = 1. Do you think this is not an important question? It seems to me that we don’t yet know how many of the Godelian obstructions carry in the probabilistic environment, and there are still real problems that will involve ingenuity to resolve.
Putting the dangers of AI progress aside, we probably ought to first work on understanding logical uncertainty in general, and start with simpler problems. I find it unlikely that we can solve “probabilistic reflection” (or even correctly specify what the problem is) when we don’t yet know what principles allow us to say that P!=NP is more likely to be true than false. Do we even know that using probabilities is the right way to handle logical uncertainty? (People assumed that using probabilities is the right way to handle indexical uncertainty and that turned out to be wrong.)
We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn’t get into priors). MIRI is working much more directly on this problem as well. Can you think of concrete open questions in that space? Basically we are just trying to develop the theory, but having simple concrete problems would surely be good. (I have a bucket of standard toy problems to resolve, and don’t have a good approach that handle all of them, but it’s pretty easy to hack together a solution to them so they don’t really count as open problems.)
I agree that AI progress is probably socially costly (highly positive for currently living folks, modestly negative for the average far future person). I think work with a theoretical bias is more likely to be helpful, and I don’t think it is very bad on net. Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.
We don’t know that probabilities are the right way to handle logical uncertainty, nor that our problem statements are correct. I think that the kind of probabilistic reflection we are working on is fairly natural though.
I agree with both you and Nick that the strategic questions are very important, probably more important than the math. I don’t think that is inconsistent with getting the mathematical research program up and going. I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern (by 1. making it clear that there are concrete policy-relevant questions here, and 2. building status and credibility for safety-concerned communities and individuals), but even neglecting that I think it would still be worth it.
I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”. It seems like we’re still at a stage when we don’t understand logical uncertainty at a deep level and can offer solutions based on fundamental principles, but just trying out various ideas to see what sticks.
I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?
Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
This is basically the same as the situation with respect to indexical probabilities. There are dominance arguments for betting odds etc. that don’t quite go through, but it seems like probabilities are still distinguished as a good best guess, and worth fleshing out. And if you accept probabilities prior specification is the clear next question.
I think it’s plausible there are net social costs, excluding reputational impacts, and would certainly prefer to think more about it first. But with reputational impacts it seems like the case is relatively clear (of course this is potentially self-serving reasoning), and there are similar gains in terms of making things seem more concrete etc.
Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality), and the second claim was that the epsilons aren’t so bad (e.g. because exact comparisons between floats are kind of silly anyway). Probably those could be more explicit in the writeup, but it would be helpful to know which steps seem shakiest.
Why do you say “exactly corresponded to reality”? You’d have an inner symbol which corresponded to the outer P, but P must be more like subjective credence than external reality, since in reality each logical statement is presumably either true or false, not a probabilistic mixture of both?
Intuitively, what I’d want is a “math intuition module” which, if it was looking at a mathematical expression denoting the beliefs that a copy of itself would have after running for a longer period of time or having more memory, would assign high probability that those beliefs would better correspond to reality than its own current beliefs. This would in turn license the AI using this MIM to build a more powerful version of itself, or just to believe that “think more” is generally a good idea aside from opportunity costs. I understand that you are not trying to directly build such an MIM, just to do a possibility proof. But your formalism looks very different from my intuitive requirement, and I don’t understand what your intuitive requirement might be.
P is intended to be like objective reality, exactly analogously with the predicate “True.” So we can adjoin P as a symbol and the reflection principle as an axiom schema, and thereby obtain a more expressive language. Depending on architecture, this also may increase the agent’s ability to formulate or reason about hypotheses.
Statements without P’s in them, are indeed either true or false with probability 1. I agree it is a bit odd for statements with P in them to have probabilities, but I don’t see a strong argument it shouldn’t happen. In particular, it seems irrelevant to anything meaningful we would like to do with a truth predicate. In subsequent versions of this result, the probabilities have been removed and the core topological considerations exposed directly.
The relationship between a truth predicate and the kind of reasoning you discuss (a MIM that believes its own computations are trustworthy) is that truth is useful or perhaps necessary for defining the kind of correspondence that you want the MIM to accept, about a general relationship between the algorithm it is running and what is “true”. So having a notion of “truth” seems like the first step.
Also, by attracting thinkers who can initially only be attracted by crisp technical problems, but as they get involved, will turn their substantial brainpower toward the strategic questions as well.
For three additional reasons for MIRI to focus on math for now, see the bullet points under “strategic research will consume a minority of our research budget in 2013” in MIRI’s Strategy for 2013.
Maybe we use the same principle that allows me to say “I guess I left my wallet at home” after I fail to find the wallet in the most likely places it could be, like my pockets. In other words, maybe we do Bayesian updating about the location of the “true” proof or disproof, as we check some apriori likely locations (attempted proofs and disproofs) and fail to find it there. This idea is still very vague, but looks promising to me because it doesn’t assume logical omniscience, unlike Abram’s and Benja’s ideas...
I think I was implicitly assuming that you wouldn’t have an agent making decisions based purely on proofs.