As I discussed before, IMO the correct approach is not looking for the one “correct” prior since there is no such thing but specifying a “pure learning” phase in AI development. In the case of your example, we can imagine the operator overriding the agent’s controls and forcing it to produce various outputs in order to update away from Hell. Given a sufficiently long learning phase, all universal priors should converge to the same result (of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of “good” universal priors).
As I discussed before, IMO the correct approach is not looking for the one “correct” prior since there is no such thing but specifying a “pure learning” phase in AI development.
I’m not sure about “no correct prior”, and even if there is no “correct prior”, maybe there is still “the right prior for me”, or “my actual prior”, which we can somehow determine or extract and build into an FAI?
In the case of your example, we can imagine the operator overriding the agent’s controls and forcing it to produce various outputs in order to update away from Hell.
How do you know when you’ve forced the agent to explore enough? What if the agent has a prior which assigns a large weight to an environment that’s indistinguishable from our universe, except that lots of good things happen if the sun gets blown up? It seems like the agent can’t update away from this during the training phase.
(of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of “good” universal priors)
So you think “universal” isn’t “good enough”, but something more specific (but perhaps not unique as in “the correct prior” or “the right prior for me”) is? Can you try to define it?
I’m not sure about “no correct prior”, and even if there is no “correct prior”, maybe there is still “the right prior for me”, or “my actual prior”, which we can somehow determine or extract and build into an FAI?
This sounds much closer home. Note, however, that there is certain ambiguity between the prior and the utility function. UDT agents maximize Sum Prior(x) U(x) so certain simultaneous redefinitions of Prior and U will lead to the same thing.
But in that case, why do we need a special “pure learning” period where you force the agent to explore? Wouldn’t any prior that would qualify as “the right prior for me” or “my actual prior” not favor any particular universe to such an extent that it prevents the agent from exploring in a reasonable way?
To recap, if we give the agent a “good” prior, then the agent will naturally explore/exploit in an optimal way without being forced to. If we give it a “bad” prior, then forcing it to explore during a pure learning period won’t help (enough) because there could be environments in the bad prior that can’t be updated away during the pure learning period and cause disaster later. Maybe if we don’t know how to define a “good” prior but there are “semi-good” priors which we know will reliably converge to a “good” prior after a certain amount of forced exploration, then a pure learning phase would be useful, but nobody has proposed such a prior, AFAIK.
If we find a mathematical formula describing the “subjectively correct” prior P and give it to the AI, the AI will still effectively use a different prior initially, namely the convolution of P with some kind of “logical uncertainty kernel”. IMO this means we still need a learning phase.
In the post you linked to, at the end you mention a proposed “fetus” stage where the agent receives no external inputs. Did you ever write the posts describing it in more detail? I have to say my initial reaction to that idea is also skeptical though. Human don’t have a fetus stage where we think/learn about math with external inputs deliberately blocked off. Why do artificial agents need it? If an agent couldn’t simultaneously learn about math and process external inputs, it seems like something must be wrong with the basic design which we should fix instead of work around.
I didn’t develop the idea, and I’m still not sure whether it’s correct. I’m planning to get back to these questions once I’m ready to use the theory of optimal predictors to put everything on rigorous footing. So I’m not sure we really need to block the external inputs. However, note that the AI is in a sense more fragile than a human since the AI is capable of self-modifying in irreversible damaging ways.
As I discussed before, IMO the correct approach is not looking for the one “correct” prior since there is no such thing but specifying a “pure learning” phase in AI development. In the case of your example, we can imagine the operator overriding the agent’s controls and forcing it to produce various outputs in order to update away from Hell. Given a sufficiently long learning phase, all universal priors should converge to the same result (of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of “good” universal priors).
I’m not sure about “no correct prior”, and even if there is no “correct prior”, maybe there is still “the right prior for me”, or “my actual prior”, which we can somehow determine or extract and build into an FAI?
How do you know when you’ve forced the agent to explore enough? What if the agent has a prior which assigns a large weight to an environment that’s indistinguishable from our universe, except that lots of good things happen if the sun gets blown up? It seems like the agent can’t update away from this during the training phase.
So you think “universal” isn’t “good enough”, but something more specific (but perhaps not unique as in “the correct prior” or “the right prior for me”) is? Can you try to define it?
This sounds much closer home. Note, however, that there is certain ambiguity between the prior and the utility function. UDT agents maximize Sum Prior(x) U(x) so certain simultaneous redefinitions of Prior and U will lead to the same thing.
But in that case, why do we need a special “pure learning” period where you force the agent to explore? Wouldn’t any prior that would qualify as “the right prior for me” or “my actual prior” not favor any particular universe to such an extent that it prevents the agent from exploring in a reasonable way?
To recap, if we give the agent a “good” prior, then the agent will naturally explore/exploit in an optimal way without being forced to. If we give it a “bad” prior, then forcing it to explore during a pure learning period won’t help (enough) because there could be environments in the bad prior that can’t be updated away during the pure learning period and cause disaster later. Maybe if we don’t know how to define a “good” prior but there are “semi-good” priors which we know will reliably converge to a “good” prior after a certain amount of forced exploration, then a pure learning phase would be useful, but nobody has proposed such a prior, AFAIK.
If we find a mathematical formula describing the “subjectively correct” prior P and give it to the AI, the AI will still effectively use a different prior initially, namely the convolution of P with some kind of “logical uncertainty kernel”. IMO this means we still need a learning phase.
In the post you linked to, at the end you mention a proposed “fetus” stage where the agent receives no external inputs. Did you ever write the posts describing it in more detail? I have to say my initial reaction to that idea is also skeptical though. Human don’t have a fetus stage where we think/learn about math with external inputs deliberately blocked off. Why do artificial agents need it? If an agent couldn’t simultaneously learn about math and process external inputs, it seems like something must be wrong with the basic design which we should fix instead of work around.
I didn’t develop the idea, and I’m still not sure whether it’s correct. I’m planning to get back to these questions once I’m ready to use the theory of optimal predictors to put everything on rigorous footing. So I’m not sure we really need to block the external inputs. However, note that the AI is in a sense more fragile than a human since the AI is capable of self-modifying in irreversible damaging ways.