asr comments on Safety Culture and the Marginal Effect of a Dollar

asr 10 Jun 2011 7:12 UTC
0 points
Yes. As tim points out below, the main thing that programmers are taught is “self-modifying programs are almost always more trouble than they’re worth—don’t do it.”

My hunch is that self-modifying AI is far more likely to crash than it is to go FOOM, and that non-self-modifying AI (or AI that self-modifies in very limited ways) may do fairly well by comparison.
- Zetetic 10 Jun 2011 7:49 UTC
  0 points
  Parent
  My understanding was that the CEV approach is a meta-level approach to stable self improvement, aiming to design code that outputs what we would want an FAI’s code to look like (or something like this). I could certainly be wrong of course, and I have very little to go on here, as the Knowability of FAI and CEV are both more vague than I would like (since, of course, the problems are still way open) and several years old, so I have to piece the picture together indirectly.
  
  If that interpretation is correct it seems (and I stress that I might be totally off base with this) that stable recursive self-improvement over time is not the biggest conceptual concern, but rather the biggest conceptual difficulty is determining how to derive a coherent goal set from a bunch of Bayesian utility maximizers equipped with each individual person’s utility function (and how to extract each person’s utility function), or something like that. A stable self-improving code would then (hopefully) be extrapolated by the resulting CEV, which is actually the initial dynamic.
  - asr 10 Jun 2011 7:59 UTC
    0 points
    Parent
    My comment wasn’t directed towards CEV at all—CEV sounds like a sensible working definition of “friendly enough”, and I agree that it’s probably computationally hard.
    
    I was suggesting that any program, AI or no, that is coded to rewrite critical parts of itself in substantial ways is likely to go “splat”, not “FOOM”—to degenerate into something that doesn’t work at all.