Stuart Russell, in his comment on the Edge.org AI discussion, offered a concise mathematical description of perverse instantiation, and seems to suggest that it is likely to occur:
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.
I’m curious if there’s more information about this behavior occurring in practice.
Stuart Russell, in his comment on the Edge.org AI discussion, offered a concise mathematical description of perverse instantiation, and seems to suggest that it is likely to occur:
I’m curious if there’s more information about this behavior occurring in practice.