Stuart_Armstrong comments on The mathematics of reduced impact: help needed

Stuart_Armstrong 17 Feb 2012 9:08 UTC
0 points

The main problem with all utility functions for AI’s is that the outcome you think you foresee might not be the outcome you get.

That’s the general problem, yes. And here it’s probably not even a utility function we’re using (the penalty function, which involves counterfactual calculations, seems a different kind of beast).

mostly by the means of some humans learning about AI’s through your existence.

Yes, this is a problem I alluded to. Here’s a solution (a poor one, but it may show the problem is not completely unsolvable): the disciple is also boxed, hidden from view, surrounded by chaotic processes. And there is another process Y which will (with very low probability) annihilate the disciple in one hour along with everything in the box. We make the master AI indifferent to Y (so, from its perspective, the disciple will have little impact whatever happens in the future, as long as it stays in the box).

And once using reduced impact AI comes into general use, they can be unboxed: a single success or failure will have little impact on human society.
- Strange7 27 Feb 2012 15:10 UTC
  0 points
  Parent
  So, after all the matrioshka-incinerators have finished their little dance, what do you actually have to show for it?