The theorem is consistent with the aliens causing trouble any finite number of times. But each time they cause the agent to do something weird their model loses some probability, so there will be some episode after which they stop causing trouble (if we manage to successfully run enough episodes without in fact having anything bad happen in the meantime, which is an assumption of the asymptotic arguments).
Thanks. Is there a way to derive a concrete bound on how long it will take for BoMAI to become “benign”, e.g., is it exponential or something more reasonable? (Although if even a single “malign” episode could lead to disaster, this may be only of academic interest.) Also, to comment on this section of the paper:
“We can only offer informal claims regarding what happens
before BoMAI is definitely benign. One intuition is that eventual
benignity with probability 1 doesn’t happen by accident:
it suggests that for the entire lifetime of the agent, everything
is conspiring to make the agent benign.”
If BoMAI can be effectively controlled by alien superintelligences before it becomes “benign” that would suggest “everything
is conspiring to make the agent benign” is misleading as far as reasoning about what BoMAI might do in the mean time.
(if we manage to successfully run enough episodes without in fact having anything bad happen in the meantime, which is an assumption of the asymptotic arguments)
Is this noted somewhere in the paper, or just implicit in the arguments? I guess what we actually need is either a guarantee that all episodes are “benign” or a bound on utility loss that we can incur through such a scheme. (I do appreciate that “in
the absence of any other algorithms for general intelligence
which have been proven asymptotically benign, let alone benign
for their entire lifetimes, BoMAI represents meaningful
theoretical progress toward designing the latter.”)
Is there a way to derive a concrete bound on how long it will take for BoMAI to become “benign”, e.g., is it exponential or something more reasonable?
The closest thing to a discussion of this so far is Appendix E, but I have not yet thought through this very carefully. When you ask if it is exponential, what exactly are you asking if it is exponential in?
When you ask if it is exponential, what exactly are you asking if it is exponential in?
I guess I was asking if it’s exponential in anything that would make BoMAI impractically slow to become “benign”, so basically just using “exponential” as a shorthand for “impractically large”.
If BoMAI can be effectively controlled by alien superintelligences before it becomes “benign” that would suggest “everything is conspiring to make the agent benign” is misleading as far as reasoning about what BoMAI might do in the mean time.
Agreed that would be misleading, but I don’t think it would be controlled by alien superintelligences.
The theorem is consistent with the aliens causing trouble any finite number of times. But each time they cause the agent to do something weird their model loses some probability, so there will be some episode after which they stop causing trouble (if we manage to successfully run enough episodes without in fact having anything bad happen in the meantime, which is an assumption of the asymptotic arguments).
Thanks. Is there a way to derive a concrete bound on how long it will take for BoMAI to become “benign”, e.g., is it exponential or something more reasonable? (Although if even a single “malign” episode could lead to disaster, this may be only of academic interest.) Also, to comment on this section of the paper:
“We can only offer informal claims regarding what happens before BoMAI is definitely benign. One intuition is that eventual benignity with probability 1 doesn’t happen by accident: it suggests that for the entire lifetime of the agent, everything is conspiring to make the agent benign.”
If BoMAI can be effectively controlled by alien superintelligences before it becomes “benign” that would suggest “everything is conspiring to make the agent benign” is misleading as far as reasoning about what BoMAI might do in the mean time.
Is this noted somewhere in the paper, or just implicit in the arguments? I guess what we actually need is either a guarantee that all episodes are “benign” or a bound on utility loss that we can incur through such a scheme. (I do appreciate that “in the absence of any other algorithms for general intelligence which have been proven asymptotically benign, let alone benign for their entire lifetimes, BoMAI represents meaningful theoretical progress toward designing the latter.”)
The closest thing to a discussion of this so far is Appendix E, but I have not yet thought through this very carefully. When you ask if it is exponential, what exactly are you asking if it is exponential in?
I guess I was asking if it’s exponential in anything that would make BoMAI impractically slow to become “benign”, so basically just using “exponential” as a shorthand for “impractically large”.
I don’t think it is, thank you for pointing this out.
Agreed that would be misleading, but I don’t think it would be controlled by alien superintelligences.