This model seems quite a bit different from mine, which is that FAI research is about reducing FAI to an AGI problem, and solving AGI takes more work than doing this reduction.
More concretely, consider a proposal such as Paul’s reflective automated philosophy method, which might be able to be implemented using epsiodic reinforcement learning. This proposal has problems, and it’s not clear that it works—but if it did, then it would have reduced FAI to a reinforcement learning problem. Presumably, any implementations of this proposal would benefit from any reinforcement learning advances in the AGI field.
Of course, even if we a proposal like this works, it might require better or different AGI capabilities from UFAI projects. I expect this to be true for black-box FAI solutions such as Paul’s. This presents additional strategic difficulties. However, I think the post fails to accurately model these difficulties. The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist.
Assuming that this reflective automated philosophy system doesn’t work, it could still be the case that there is a different reduction from FAI to AGI that can be created through armchair technical philosophy. This is often what MIRI’s “unbounded solutions” research is about: finding ways you could solve FAI if you had a hypercomputer. Once you find a solution like this, it might be possible to define it in terms of AGI capabilities instead of hypercomputation, and at that point FAI would be reduced to an AGI problem. We haven’t put enough work into this problem to know that a reduction couldn’t be created in, say, 20 years by 20 highly competent mathematician-philosophers.
In the most pessimistic case (which I don’t think is too likely), the task of reducing FAI to an AGI problem is significantly harder than creating AGI. In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year).
[Edited: replaced Gremaining with Fremaining, which is what I originally meant]
Thanks for the comment jessicat! I haven’t read those posts yet, will do more research on reducing FAI to an AGI problem.
A few responses & clarifications:
Our framework assumes the FAI research would happen before AGI creation. If we can research how to reduce FAI to an AGI problem in a way that would reliably make a future AGI friendly, then that amount of research would be our variable Fremaining. If that is quite easy to do, then that’s fantastic; an AI venture would have an easy time, and the leakage ratio would be low enough to not have to worry about. Additional required capabilities that we’ll find out we need would be added to Fremaining.
“I think the post fails to accurately model these difficulties.”
→ This post doesn’t attempt to model the individual challenges to understand how large Fremaining actually is. That’s probably a more important question than what we addressed, but one for a different model.
“The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist.”
→ This paper definitely advocates for AGI researchers to develop FAI research while not publishing much AGI research. I agree that some internal AGI research will probably be necessary, but hope that it won’t be a whole lot. If the tools to create an AGI were figured out, even if they were kept secret by an FAI research group, I would be very scared. Those would be the most important and dangerous secrets of all time, and I doubt they could be kept secret for very long (20 years max?)
“In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year).”
-> This paper purposefully didn’t model research effort, but rather, abstract units of research significance.
“the numbers of rg and rf don’t perfectly correlate with the difficulty to reach them. It may be that we have diminishing marginal returns with our current levels of rg, so similar levels of rf will be easier to reach.”
A model that would also take into account the effort required would require a few more assumptions and additional complexity. I prefer to start simple and work from there, so we at least know what people do agree on before adding additional complexity.
Thanks for the detailed response! I do think the framework can still work with my assumptions. The way I would model it would be something like:
In the first stage, we have G->Fremaining (the research to an AGI->FAI solution) and Gremaining (the research to enough AGI for UFAI). I expect G->Fremaining < Gremaining, and a relatively low leakage ratio.
after we have AGI->FAI, we have Fremaining (the research for the AGI to input to the AGI->FAI) and Gremaning (the research to enough AGI for UFAI). I expect Fremaining > Gremaining, and furthermore I expect the leakage ratio to be high enough that we are practically guaranteed to have enough AGI capabilities for UFAI before FAI (though I don’t know how long before). Hence the strategic importance of developing AGI capabilities in secret, and not having them lying around for too long in too many hands. I don’t really see a way of avoiding this: the alternative is to have enough research to create FAI but not a paperclip maximizer, which seems implausible (though it would be really nice if we could get this state!).
Also, it seems I had misinterpreted the part about rg and rf, sorry about that!
I guess the most controversial, and hopefully false, assumption of this paper is #3:
‘If Gremaining is reached before Fremaining, a UFAI will be created. If after, an FAI will be created.’
This basically is the AI Foom scenario, where the moment an AGI is created, it will either kill us or all or bring about utopia (or both).
If this is not the case, and we have a long time to work with the AGI as it develops to make sure it is friendly, then this model isn’t very useful.
If we do assume these assumptions, I would also expect that we will reach Gremaining before Fremaining, or at least that a private organization will end up doing so. However, I am also very skeptical in the power of secrets. I think I find us reaching Fremaining first more likely than a private institution reaching Gremaining first, but hiding it until it later reaches Fremaining, though both may be very slim. If the US military or a similar group with a huge technological and secretive advantage were doing this, there could be more of a chance. This definitely seems like a game of optimizing small probabilities.
Either way, I think we definitely would agree here that the organization developing these secrets can strategically choose projects that deliver the high amounts of FAI research relative to the amount AGI research they will have to keep secretive. Begin with the easy, non-secretive wins and work from there.
We may need the specific technology to create a paperclip maximizer before we make an FAI, but if we plan correctly, we hopefully will be really close to reaching an FAI by that point.
This basically is the AI Foom scenario, where the moment an AGI is created, it will either kill us or all or bring about utopia (or both).
The question is not “if”. The questions are “how quickly” and “to what height”. An AI capable of self-improving to world-destroying levels within moments is plainly unrealistic. An AI capable of self-improving to dangerous levels (viz: levels where it can start making humans do the dangerous work for it) in the weeks, months, or even years it would take a team of human operators to cross-examine the formally unspecified motivation engines for Friendliness is dangerously realistic.
I feel like this still fits their model pretty well. You need Fremaining time to find AGI->FAI solution, while there is Gremaining time before someone builds an AGI.
This is often what MIRI’s “unbounded solutions” research is about: finding ways you could solve FAI if you had a hypercomputer.
Sorry to criticize out of the blue, but I think that’s a very bad idea. To wit, “Assume a contradiction, prove False, and ex falso quodlibet.” If you start by assuming a hypercomputer and reason mathematically from there, I think you’ll mostly derive paradox theorems and contradictions.
I should be specific that the kinds of results we want to get are those where you could, in principle, use a very powerful computer instead of a hypercomputer. Roughly, the unbounded algorithm should be a limit of bounded algorithms. The kinds of allowed operations I am thinking about include:
Solomonoff induction
optimizing an arbitrary function
evaluating an arbitrary probabilistic program
finding a proof of X if one exists
solving an infinite system of equations that is guaranteed to have a solution
In all these cases, you can get arbitrarily good approximations using bounded algorithms, although they might require a very large amount of computation power. I don’t think things like this would lead to contradictions if you did them correctly.
Ah, ok. So you’re saying, “Let’s do FAI by first assuming we have an incomplete infinity of processing power to apply—thus assuming the Most Powerful Possible Agent as our ‘adversary’ to be ‘tamed’.” Hence the continual use of AIXI?
Yes, something like that, although I don’t usually think of it as an adversary. Mainly it’s so I can ask questions like “how could a FAI model its operator so that it can infer the operator’s values from their behavior?” without getting hung up on the exact representation of the model or how the model is found. We don’t have any solution to this problem, even if we had access to a probabilistic program induction black box, so it would be silly to impose the additional restriction that we can’t give the black box any induction problems that are too hard.
That said, bounded algorithms can be useful as inspiration, even for unbounded problems. For example, I’m currently looking at ways you could use probabilistic programs with nested queries to model Vingean reflection.
For example, I’m currently looking at ways you could use probabilistic programs with nested queries to model Vingean reflection.
ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I’ve been wanting to do.
Yes, something like that, although I don’t usually think of it as an adversary.
I more meant “adversary” in crypto terms: something that can and will throw behavior at us we don’t want unless we formally demonstrate that it can’t.
That said, bounded algorithms can be useful as inspiration, even for unbounded problems.
I have a slightly different perspective on the bounded/unbounded issue. Have you ever read Jaynes’ Probability Theory? Well, I never got up to the part where he undoes paradoxes, but the way he preached about it sunk in: a paradox will often arise because you passed to the limit too early in your proof or construction. I’ve also been very impressed by the degree to which resource-rational and bounded-rational models of cognition explain facts about real minds that unbounded models either can’t explain at all or write off as “irrational”.
To quote myself (because it’s applicable here but the full text isn’t done):
The key is that AIXI evaluates K(x), the Kolmogorov complexity of each possible Turing-machine program. This function allows a Solomonoff Inducer to perfectly separate the random information in its sensory data from the structural information, yielding an optimal distribution over representations that contain nothing but causal structure. This is incomputable, or requires infinite algorithmic information—AIXI can update optimally on sensory information by falling back on its infinite computing power.
In my perspective, at least, AIXI is cheating by assuming unbounded computational power, with the result that even the “bounded” and “approximate” AIXI_{tl} runs in “optimal” time modulo an astronomically-large additive constant. So I think that a “bottom-up” theory of bounded-rational reasoning or resource-rational reasoning—one that starts with the assumption we have strictly bounded finite compute-power the same way probability theory assumes we have strictly bounded finite information—will work a lot better to explain how to scale up by “passing to the limit” at the last step.
Which then goes to that research I want to do: I think we could attack logical uncertainty and probabilistic reflection by finding a theory for how to trade finite amounts of compute time for finite amounts of algorithmic information. The structure currently in my imagination is a kind of probability mixed with domain theory: the more computing power you add, the more certain you can become about the results of computations, even if you still have to place some probability mass on \Bot (bottom). In fact, if you find over time that you place more probability mass on \Bot, then you’re acquiring a degree of belief that the computation in question won’t terminate.
I think this would then mix with probabilistic programming fairly well, and also have immediate applications to assigning rational, well-behaved degrees of belief to “weird” propositions like Goedel Sentences or Halting predicates.
(BTW: here’s a writeup of one of my ideas for writing planning queries that you might be interested in)
Often we want a model where the probability of taking action a is proportional to p(a)e^E[U(x, a)], where p is the prior over actions, x consists of some latent variables, and U is the utility function. The straightforward way of doing this fails:
query {
. a ~ p()
. x ~ P(x)
. factor(U(x, a))
}
Note that I’m assuming factor takes a log probability as its argument. This fails due to “wishful thinking”: it tends to prefer riskier actions. The problem can be reduced by taking more samples:
query {
. a ~ p()
. us = []
. for i = 1 to n
. . x_i ~ P(x)
. . us.append(U(x_i, a))
. factor(mean(us))
}
This does better, because since we took multiple samples, mean(us) is likely to be somewhat accurate. But how do we know how many samples to take? The exact query we want cannot be expressed with any finite n.
It turns out that we just need to sample n from a Poisson distribution and make some more adjustments:
query {
. a ~ p()
. n ~ Poisson(1)
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x_i, a))
}
Note that U must be non-negative. Why does this work? Consider:
P(a) α p(a) E[e^sum(log U(x_i, a) for i in range(n))]
= p(a) E[prod(U(x_i, a) for i in range(n))]
= p(a) E[ E[prod(U(x_i, a) for i in range(n)) | n] ]
[here use the fact that the terms in the product are independent]
= p(a) E[ E[U(x, a)]^n ]
= p(a) sum(i=0 to infinity) E[U(x, a)]^i / i!
[Taylor series!]
= p(a) e^E[U(x, a)]
Ideally, this technique would help to perform inference in planning models where we can’t enumerate all possible states.
Your model selects an action proportional to p(a) E[sigmoid(U) | a], whereas mine selects an action proportional to p(a) e^E[U | a]. I think the second is better, because it actually treats actions the same if they have the same expected utility. The sigmoid version will not take very high utilities or very low utilities into account much.
Btw it’s also possible to select an action proportional to E[U | a]^n:
query {
. a ~ p()
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x, a))
}
Could you explain your syntax here? What probabilistic programming language are you using?
I think the second is better, because it actually treats actions the same if they have the same expected utility.
Well so does the sigmoided version, but you are right that the sigmoid version won’t take very high or very low utilities into account. It’s meant to shoehorn unbounded utility functions into a framework where one normally works only with random variables.
ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I’ve been wanting to do.
Well, a write-up doesn’t exist because I haven’t actually done the math yet :)
But the idea is about algorithms for doing nested queries. There’s a planning framework where you take action a proportional to p(a) e^E[U | a]. If one of these actions is “defer to your successor”, then the computation of (U | a) is actually another query that samples a different action b proportional to p(b) e^E[U | b]. In this case you can actually just go ahead and convert the resulting nested query to a 1-level query: you can convert a “softmax of softmax” into a regular softmax, if that makes sense.
This isn’t doing Vingean reflection, because it’s actually doing all the computational work that its successor would have to do. So I’m interested in ways to simplify computationally expensive nested queries into approximate computationally cheap single queries.
Here’s a simple example of why I think this might be possible. Suppose I flip a coin to decide whether the SAT problem I generate has a solution or not. Then I run a nested query to generate a SAT problem that either does or does not have a solution (depending on the original coin flip). Then I hand you the problem, and you have to guess whether it has a solution or not. I check your solution using a query to find the solution to the problem.
If you suck at solving SAT problems, your best bet might just be to guess that there’s a 50% chance that the problem is solveable. You could get this kind of answer by refactoring the complicated nested nested query model into a non-nested model and then noting that the SAT problem itself gives you very little information about whether it is solveable (subject to your computational constraints).
I’m thinking of figuring out the math here better and then applying it to things like planning queries where your successor has a higher rationality parameter than you (an agent with rationality parameter α takes action a with probability proportional to p(a) e^(α * E[U | a]) ). The goal would be to formalize some agent that, for example, generally chooses to defer to a successor who has a higher rationality parameter, unless there is some cost for deferring, in which case it may defer or not depending on some approximation of value of information.
Your project about trading computing power for algorithmic information seems interesting and potentially related, and I’d be interested in seeing any results you come up with.
even if you still have to place some probability mass on \Bot (bottom)
Is this because you assign probability mass to inconsistent theories that you don’t know are inconsistent?
When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights? I have work to do, but will be giving your little write-up here a second read-through soon.
Is this because you assign probability mass to inconsistent theories that you don’t know are inconsistent?
The idea isn’t to assign probability mass to logical theories, but to the outcomes of computations in general. This is partly because computations-in-general strictly contains encodings of all possible proof systems, but also because, if we’re building algorithms that have to confront a Turing-complete environment, the environment may sometimes contain nontrivially nonhalting computations, which can’t be logically proved not to terminate. Since any realistic agent needs to be able to handle whatever its environment throws at it, it seems to follow that a realistic agent needs some resource-rational way to handle nonprovable nonhalting.
When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights?
I’m not really that familiar with adaboost. The planning model is just reflecting the fact that bounded agents don’t always take the maximum expected utility action. The higher alpha is, the more bias there is towards good actions, but the more potentially expensive the computation is (e.g. if you use rejection sampling).
Since any realistic agent needs to be able to handle whatever its environment throws at it, it seems to follow that a realistic agent needs some resource-rational way to handle nonprovable nonhalting.
Ah, that makes sense! I think I see how “trading computational power for algorithmic information” makes sense in this framework.
The planning model is just reflecting the fact that bounded agents don’t always take the maximum expected utility action. The higher alpha is, the more bias there is towards good actions, but the more potentially expensive the computation is (e.g. if you use rejection sampling).
Ah, that makes sense!
Ah, that makes sense! I think I see how “trading computational power for algorithmic information” makes sense in this framework.
And before I could scribble a damned thing, Calude went and solved it six months ago. The Halting Problem, I mean.
Cool. If I get the meaning of the result well, it’s that if you run a random program for some number of steps and it doesn’t halt, then (depending on the exact numbers) it will be unlikely to halt when run on a supercomputer either, because halting times have low density. So almost all programs halt quickly or run a really really long time. Is this correct? This doesn’t quite let you approximate Chaitin’s omega, but it’s interesting that you can approximate a bounded variant of Chaitin’s omega (like what percentage of Turing machines halt when run for 10^50 steps). I can see how this would let you solve the halting problem well enough when you live in a bounded universe.
Cool. If I get the meaning of the result well, it’s that if you run a random program for some number of steps and it doesn’t halt, then (depending on the exact numbers) it will be unlikely to halt when run on a supercomputer either, because halting times have low density. So almost all programs halt quickly or run a really really long time. Is this correct?
Yep. Or, to put it a little tiny bit more accurately, you get a halting probability for your particular Turing machine, conditioned on the number of steps for which you’ve run it.
This doesn’t quite let you approximate Chaitin’s omega
Well technically, you can approximate Chaitin’s omega from below just as Chaitin himself describes in his book. You’ll just only be able to calculate finitely many digits.
I can see how this would let you solve the halting problem well enough when you live in a bounded universe.
Which we do ;-). You could run until you get past a desired threshold of probability (hypothesis testing), or you could use a bounded-rationality approach to vary your surety of nonhalting with your available processing power.
But overall, it gives you a way to “reason around” the Halting Problem, which, when we apply it to the various paradoxes of self-reference… you can see where I’m going with this.
I’m thinking of figuring out the math here better and then applying it to things like planning queries where your successor has a higher rationality parameter than you (an agent with rationality parameter α takes action a with probability proportional to p(a) e^(α * E[U | a]) ). The goal would be to formalize some agent that, for example, generally chooses to defer to a successor who has a higher rationality parameter, unless there is some cost for deferring, in which case it may defer or not depending on some approximation of value of information.
How does this deal with the Paradox of Procrastination?
Due to the planning model, the successor always has some nonzero probability of not pressing the button, so (depending on how much you value pressing it later) it’ll be worth it to press it at some point.
A hypercomputer is a computer that can deterministically decide the Halting Problem for a Turing machine in finite time. We already know that this is physically impossible.
And unfortunately, most of the FAI work I’ve seen under the assumption of having a hypercomputer tends to end up along the lines of, “We started by assuming we had a Turing Oracle, and proved that given a second-level Turing Oracle, we can implement UDT with blah blah blah.”
We don’t know that it’s physically impossible, although it does look that way, but even if we did that doesn’t mean it’s contradictory, not to the extent that using it you’ll “mostly derive paradox theorems and contradictions”.
If Turing oracles are not physically impossible, then we need an explanation for how physics implements an infinite tower of Turing oracle levels. Short of that, I’m going to believe Turing oracles are impossible.
even if we did that doesn’t mean it’s contradictory, not to the extent that using it you’ll “mostly derive paradox theorems and contradictions”.
If you start with something undecidable and build on it, you usually find that your results are even more undecidable (require a higher level of Turing oracle). There’s also the AIT angle, which says that a true Turing oracle possesses infinite Kolmogorov complexity, and since Shannon entropy is the expected-value of Kolmogorov complexity, and Shannon entropy is closely related to physical entropy… we have strong reason to think that a Turing oracle violates basic thermodynamics.
Why do we need the full tower? Why couldn’t it be the case that just one (or some other finite number) of the Turing Oracle levels are physically possible?
Effectively, there is either some natural number n such that physics allows for n levels of physically-implementable Turing oracles, or the number is omega. Mostly, we think the number should either be zero or omega, because once you have a first-level Turing Oracle, you construct the next level just by phrasing the Halting Problem for Turing Machines with One Oracle, and then positing an oracle for that, and so on.
Likewise, having omega (cardinality of the natural numbers) bits of algorithmic information is equivalent to having a first-level Turing Oracle (knowing the value of Chaitin’s Omega completely). From there, you start needing larger and larger infinities of bits to handle higher levels of the Turing hierarchy.
So the question is: how large a set of bits can physics allow us to compute with? Possible answers are:
Finite only. This is what we currently believe.
Countably infinite (Alef zero) or Continuum infinite (Alef one). Playing time-dilation games with General Relativity might, in certain funky situations I don’t quite understand but which form the basis of some science fiction, almost allow you to get up to here. But it would require negative energy or infinite mass or things of that nature.
Arbitrarily large infinities. Almost definitely not.
Omega: if we’re completely wrong about the relationship between computation and physics as we know it, possible.
This model seems quite a bit different from mine, which is that FAI research is about reducing FAI to an AGI problem, and solving AGI takes more work than doing this reduction.
More concretely, consider a proposal such as Paul’s reflective automated philosophy method, which might be able to be implemented using epsiodic reinforcement learning. This proposal has problems, and it’s not clear that it works—but if it did, then it would have reduced FAI to a reinforcement learning problem. Presumably, any implementations of this proposal would benefit from any reinforcement learning advances in the AGI field.
Of course, even if we a proposal like this works, it might require better or different AGI capabilities from UFAI projects. I expect this to be true for black-box FAI solutions such as Paul’s. This presents additional strategic difficulties. However, I think the post fails to accurately model these difficulties. The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist.
Assuming that this reflective automated philosophy system doesn’t work, it could still be the case that there is a different reduction from FAI to AGI that can be created through armchair technical philosophy. This is often what MIRI’s “unbounded solutions” research is about: finding ways you could solve FAI if you had a hypercomputer. Once you find a solution like this, it might be possible to define it in terms of AGI capabilities instead of hypercomputation, and at that point FAI would be reduced to an AGI problem. We haven’t put enough work into this problem to know that a reduction couldn’t be created in, say, 20 years by 20 highly competent mathematician-philosophers.
In the most pessimistic case (which I don’t think is too likely), the task of reducing FAI to an AGI problem is significantly harder than creating AGI. In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year).
[Edited: replaced Gremaining with Fremaining, which is what I originally meant]
Thanks for the comment jessicat! I haven’t read those posts yet, will do more research on reducing FAI to an AGI problem.
A few responses & clarifications:
Our framework assumes the FAI research would happen before AGI creation. If we can research how to reduce FAI to an AGI problem in a way that would reliably make a future AGI friendly, then that amount of research would be our variable Fremaining. If that is quite easy to do, then that’s fantastic; an AI venture would have an easy time, and the leakage ratio would be low enough to not have to worry about. Additional required capabilities that we’ll find out we need would be added to Fremaining.
“I think the post fails to accurately model these difficulties.” → This post doesn’t attempt to model the individual challenges to understand how large Fremaining actually is. That’s probably a more important question than what we addressed, but one for a different model.
“The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist.” → This paper definitely advocates for AGI researchers to develop FAI research while not publishing much AGI research. I agree that some internal AGI research will probably be necessary, but hope that it won’t be a whole lot. If the tools to create an AGI were figured out, even if they were kept secret by an FAI research group, I would be very scared. Those would be the most important and dangerous secrets of all time, and I doubt they could be kept secret for very long (20 years max?)
“In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year).”
-> This paper purposefully didn’t model research effort, but rather, abstract units of research significance. “the numbers of rg and rf don’t perfectly correlate with the difficulty to reach them. It may be that we have diminishing marginal returns with our current levels of rg, so similar levels of rf will be easier to reach.”
A model that would also take into account the effort required would require a few more assumptions and additional complexity. I prefer to start simple and work from there, so we at least know what people do agree on before adding additional complexity.
Thanks for the detailed response! I do think the framework can still work with my assumptions. The way I would model it would be something like:
In the first stage, we have G->Fremaining (the research to an AGI->FAI solution) and Gremaining (the research to enough AGI for UFAI). I expect G->Fremaining < Gremaining, and a relatively low leakage ratio.
after we have AGI->FAI, we have Fremaining (the research for the AGI to input to the AGI->FAI) and Gremaning (the research to enough AGI for UFAI). I expect Fremaining > Gremaining, and furthermore I expect the leakage ratio to be high enough that we are practically guaranteed to have enough AGI capabilities for UFAI before FAI (though I don’t know how long before). Hence the strategic importance of developing AGI capabilities in secret, and not having them lying around for too long in too many hands. I don’t really see a way of avoiding this: the alternative is to have enough research to create FAI but not a paperclip maximizer, which seems implausible (though it would be really nice if we could get this state!).
Also, it seems I had misinterpreted the part about rg and rf, sorry about that!
Good point.
I guess the most controversial, and hopefully false, assumption of this paper is #3: ‘If Gremaining is reached before Fremaining, a UFAI will be created. If after, an FAI will be created.’
This basically is the AI Foom scenario, where the moment an AGI is created, it will either kill us or all or bring about utopia (or both).
If this is not the case, and we have a long time to work with the AGI as it develops to make sure it is friendly, then this model isn’t very useful.
If we do assume these assumptions, I would also expect that we will reach Gremaining before Fremaining, or at least that a private organization will end up doing so. However, I am also very skeptical in the power of secrets. I think I find us reaching Fremaining first more likely than a private institution reaching Gremaining first, but hiding it until it later reaches Fremaining, though both may be very slim. If the US military or a similar group with a huge technological and secretive advantage were doing this, there could be more of a chance. This definitely seems like a game of optimizing small probabilities.
Either way, I think we definitely would agree here that the organization developing these secrets can strategically choose projects that deliver the high amounts of FAI research relative to the amount AGI research they will have to keep secretive. Begin with the easy, non-secretive wins and work from there.
We may need the specific technology to create a paperclip maximizer before we make an FAI, but if we plan correctly, we hopefully will be really close to reaching an FAI by that point.
The question is not “if”. The questions are “how quickly” and “to what height”. An AI capable of self-improving to world-destroying levels within moments is plainly unrealistic. An AI capable of self-improving to dangerous levels (viz: levels where it can start making humans do the dangerous work for it) in the weeks, months, or even years it would take a team of human operators to cross-examine the formally unspecified motivation engines for Friendliness is dangerously realistic.
I feel like this still fits their model pretty well. You need Fremaining time to find AGI->FAI solution, while there is Gremaining time before someone builds an AGI.
Sorry to criticize out of the blue, but I think that’s a very bad idea. To wit, “Assume a contradiction, prove False, and ex falso quodlibet.” If you start by assuming a hypercomputer and reason mathematically from there, I think you’ll mostly derive paradox theorems and contradictions.
I should be specific that the kinds of results we want to get are those where you could, in principle, use a very powerful computer instead of a hypercomputer. Roughly, the unbounded algorithm should be a limit of bounded algorithms. The kinds of allowed operations I am thinking about include:
Solomonoff induction
optimizing an arbitrary function
evaluating an arbitrary probabilistic program
finding a proof of X if one exists
solving an infinite system of equations that is guaranteed to have a solution
In all these cases, you can get arbitrarily good approximations using bounded algorithms, although they might require a very large amount of computation power. I don’t think things like this would lead to contradictions if you did them correctly.
Ah, ok. So you’re saying, “Let’s do FAI by first assuming we have an incomplete infinity of processing power to apply—thus assuming the Most Powerful Possible Agent as our ‘adversary’ to be ‘tamed’.” Hence the continual use of AIXI?
Yes, something like that, although I don’t usually think of it as an adversary. Mainly it’s so I can ask questions like “how could a FAI model its operator so that it can infer the operator’s values from their behavior?” without getting hung up on the exact representation of the model or how the model is found. We don’t have any solution to this problem, even if we had access to a probabilistic program induction black box, so it would be silly to impose the additional restriction that we can’t give the black box any induction problems that are too hard.
That said, bounded algorithms can be useful as inspiration, even for unbounded problems. For example, I’m currently looking at ways you could use probabilistic programs with nested queries to model Vingean reflection.
ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I’ve been wanting to do.
I more meant “adversary” in crypto terms: something that can and will throw behavior at us we don’t want unless we formally demonstrate that it can’t.
I have a slightly different perspective on the bounded/unbounded issue. Have you ever read Jaynes’ Probability Theory? Well, I never got up to the part where he undoes paradoxes, but the way he preached about it sunk in: a paradox will often arise because you passed to the limit too early in your proof or construction. I’ve also been very impressed by the degree to which resource-rational and bounded-rational models of cognition explain facts about real minds that unbounded models either can’t explain at all or write off as “irrational”.
To quote myself (because it’s applicable here but the full text isn’t done):
In my perspective, at least, AIXI is cheating by assuming unbounded computational power, with the result that even the “bounded” and “approximate” AIXI_{tl} runs in “optimal” time modulo an astronomically-large additive constant. So I think that a “bottom-up” theory of bounded-rational reasoning or resource-rational reasoning—one that starts with the assumption we have strictly bounded finite compute-power the same way probability theory assumes we have strictly bounded finite information—will work a lot better to explain how to scale up by “passing to the limit” at the last step.
Which then goes to that research I want to do: I think we could attack logical uncertainty and probabilistic reflection by finding a theory for how to trade finite amounts of compute time for finite amounts of algorithmic information. The structure currently in my imagination is a kind of probability mixed with domain theory: the more computing power you add, the more certain you can become about the results of computations, even if you still have to place some probability mass on \Bot (bottom). In fact, if you find over time that you place more probability mass on \Bot, then you’re acquiring a degree of belief that the computation in question won’t terminate.
I think this would then mix with probabilistic programming fairly well, and also have immediate applications to assigning rational, well-behaved degrees of belief to “weird” propositions like Goedel Sentences or Halting predicates.
(BTW: here’s a writeup of one of my ideas for writing planning queries that you might be interested in)
Often we want a model where the probability of taking action a is proportional to p(a)e^E[U(x, a)], where p is the prior over actions, x consists of some latent variables, and U is the utility function. The straightforward way of doing this fails:
Note that I’m assuming factor takes a log probability as its argument. This fails due to “wishful thinking”: it tends to prefer riskier actions. The problem can be reduced by taking more samples:
This does better, because since we took multiple samples, mean(us) is likely to be somewhat accurate. But how do we know how many samples to take? The exact query we want cannot be expressed with any finite n.
It turns out that we just need to sample n from a Poisson distribution and make some more adjustments:
Note that U must be non-negative. Why does this work? Consider:
Ideally, this technique would help to perform inference in planning models where we can’t enumerate all possible states.
Interesting! How does that compare to the usual implementations of planning as probabilistic inference, as exemplified below?
Your model selects an action proportional to p(a) E[sigmoid(U) | a], whereas mine selects an action proportional to p(a) e^E[U | a]. I think the second is better, because it actually treats actions the same if they have the same expected utility. The sigmoid version will not take very high utilities or very low utilities into account much.
Btw it’s also possible to select an action proportional to E[U | a]^n:
Could you explain your syntax here? What probabilistic programming language are you using?
Well so does the sigmoided version, but you are right that the sigmoid version won’t take very high or very low utilities into account. It’s meant to shoehorn unbounded utility functions into a framework where one normally works only with random variables.
It’s not a specific programming language, I guess it’s meant to look like Church. It could be written as:
It samples an action proportional to p(a) E[sigmoid(U) | a]. This can’t be written as a function of E[U | a].
Well, a write-up doesn’t exist because I haven’t actually done the math yet :)
But the idea is about algorithms for doing nested queries. There’s a planning framework where you take action a proportional to p(a) e^E[U | a]. If one of these actions is “defer to your successor”, then the computation of (U | a) is actually another query that samples a different action b proportional to p(b) e^E[U | b]. In this case you can actually just go ahead and convert the resulting nested query to a 1-level query: you can convert a “softmax of softmax” into a regular softmax, if that makes sense.
This isn’t doing Vingean reflection, because it’s actually doing all the computational work that its successor would have to do. So I’m interested in ways to simplify computationally expensive nested queries into approximate computationally cheap single queries.
Here’s a simple example of why I think this might be possible. Suppose I flip a coin to decide whether the SAT problem I generate has a solution or not. Then I run a nested query to generate a SAT problem that either does or does not have a solution (depending on the original coin flip). Then I hand you the problem, and you have to guess whether it has a solution or not. I check your solution using a query to find the solution to the problem.
If you suck at solving SAT problems, your best bet might just be to guess that there’s a 50% chance that the problem is solveable. You could get this kind of answer by refactoring the complicated nested nested query model into a non-nested model and then noting that the SAT problem itself gives you very little information about whether it is solveable (subject to your computational constraints).
I’m thinking of figuring out the math here better and then applying it to things like planning queries where your successor has a higher rationality parameter than you (an agent with rationality parameter α takes action a with probability proportional to p(a) e^(α * E[U | a]) ). The goal would be to formalize some agent that, for example, generally chooses to defer to a successor who has a higher rationality parameter, unless there is some cost for deferring, in which case it may defer or not depending on some approximation of value of information.
Your project about trading computing power for algorithmic information seems interesting and potentially related, and I’d be interested in seeing any results you come up with.
Is this because you assign probability mass to inconsistent theories that you don’t know are inconsistent?
When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights? I have work to do, but will be giving your little write-up here a second read-through soon.
The idea isn’t to assign probability mass to logical theories, but to the outcomes of computations in general. This is partly because computations-in-general strictly contains encodings of all possible proof systems, but also because, if we’re building algorithms that have to confront a Turing-complete environment, the environment may sometimes contain nontrivially nonhalting computations, which can’t be logically proved not to terminate. Since any realistic agent needs to be able to handle whatever its environment throws at it, it seems to follow that a realistic agent needs some resource-rational way to handle nonprovable nonhalting.
I’m not really that familiar with adaboost. The planning model is just reflecting the fact that bounded agents don’t always take the maximum expected utility action. The higher alpha is, the more bias there is towards good actions, but the more potentially expensive the computation is (e.g. if you use rejection sampling).
Ah, that makes sense! I think I see how “trading computational power for algorithmic information” makes sense in this framework.
Ah, that makes sense!
And before I could scribble a damned thing, Calude went and solved it six months ago. The Halting Problem, I mean.
I wonder how he feels about that, because my current feeling about this is HOLY FUCKING SHIT. By GOD, my AIT textbook cannot get here fast enough.
Cool. If I get the meaning of the result well, it’s that if you run a random program for some number of steps and it doesn’t halt, then (depending on the exact numbers) it will be unlikely to halt when run on a supercomputer either, because halting times have low density. So almost all programs halt quickly or run a really really long time. Is this correct? This doesn’t quite let you approximate Chaitin’s omega, but it’s interesting that you can approximate a bounded variant of Chaitin’s omega (like what percentage of Turing machines halt when run for 10^50 steps). I can see how this would let you solve the halting problem well enough when you live in a bounded universe.
Yep. Or, to put it a little tiny bit more accurately, you get a halting probability for your particular Turing machine, conditioned on the number of steps for which you’ve run it.
Well technically, you can approximate Chaitin’s omega from below just as Chaitin himself describes in his book. You’ll just only be able to calculate finitely many digits.
Which we do ;-). You could run until you get past a desired threshold of probability (hypothesis testing), or you could use a bounded-rationality approach to vary your surety of nonhalting with your available processing power.
But overall, it gives you a way to “reason around” the Halting Problem, which, when we apply it to the various paradoxes of self-reference… you can see where I’m going with this.
You can always acquire it nautically to cut down on shipping time :P
What one did you get? I went with Li and Vitanyi, and have enjoyed my first readthrough.
The first one I bought was Calude’s, but it demands more background in pure maths than I have, so I went for Li and Vitanyi’s.
How does this deal with the Paradox of Procrastination?
Due to the planning model, the successor always has some nonzero probability of not pressing the button, so (depending on how much you value pressing it later) it’ll be worth it to press it at some point.
Why do you think that a hypercomputer is inherently contradictory?
A hypercomputer is a computer that can deterministically decide the Halting Problem for a Turing machine in finite time. We already know that this is physically impossible.
And unfortunately, most of the FAI work I’ve seen under the assumption of having a hypercomputer tends to end up along the lines of, “We started by assuming we had a Turing Oracle, and proved that given a second-level Turing Oracle, we can implement UDT with blah blah blah.”
We don’t know that it’s physically impossible, although it does look that way, but even if we did that doesn’t mean it’s contradictory, not to the extent that using it you’ll “mostly derive paradox theorems and contradictions”.
If Turing oracles are not physically impossible, then we need an explanation for how physics implements an infinite tower of Turing oracle levels. Short of that, I’m going to believe Turing oracles are impossible.
If you start with something undecidable and build on it, you usually find that your results are even more undecidable (require a higher level of Turing oracle). There’s also the AIT angle, which says that a true Turing oracle possesses infinite Kolmogorov complexity, and since Shannon entropy is the expected-value of Kolmogorov complexity, and Shannon entropy is closely related to physical entropy… we have strong reason to think that a Turing oracle violates basic thermodynamics.
Why do we need the full tower? Why couldn’t it be the case that just one (or some other finite number) of the Turing Oracle levels are physically possible?
Effectively, there is either some natural number
n
such that physics allows forn
levels of physically-implementable Turing oracles, or the number is omega. Mostly, we think the number should either be zero or omega, because once you have a first-level Turing Oracle, you construct the next level just by phrasing the Halting Problem for Turing Machines with One Oracle, and then positing an oracle for that, and so on.Likewise, having omega (cardinality of the natural numbers) bits of algorithmic information is equivalent to having a first-level Turing Oracle (knowing the value of Chaitin’s Omega completely). From there, you start needing larger and larger infinities of bits to handle higher levels of the Turing hierarchy.
So the question is: how large a set of bits can physics allow us to compute with? Possible answers are:
Finite only. This is what we currently believe.
Countably infinite (Alef zero) or Continuum infinite (Alef one). Playing time-dilation games with General Relativity might, in certain funky situations I don’t quite understand but which form the basis of some science fiction, almost allow you to get up to here. But it would require negative energy or infinite mass or things of that nature.
Arbitrarily large infinities. Almost definitely not.
Omega: if we’re completely wrong about the relationship between computation and physics as we know it, possible.