gwern comments on Open & Welcome Thread—August 2020

gwern 15 Aug 2020 23:36 UTC
4 points
0
Yes. For example: lots of applications use online learning. A programmer flips the meaning of a boolean flag in a database somewhere while not updating all downstream callers, and suddenly an online learner is now actively pessimizing their target metric.
What links here?
- Anirandis 22 Aug 2020 18:25 UTC
  3 points
  0
  Parent
  Do you think that this specific risk could be mitigated by some variant of Eliezer’s separation from hyperexistential risk or Stuart Armstrong’s idea here:
  Let B1 and B2 be excellent, bestest outcomes. Define U(B1) = 1, U(B2) = −1, and U = 0 otherwise. Then, under certain assumptions about what probabilistic combinations of worlds it is possible to create, maximising or minimising U leads to good outcomes.
  Or, more usefully, let X be some trivial feature that the agent can easily set to −1 or 1, and let U be a utility function with values in [0, 1]. Have the AI maximise or minimise XU. Then the AI will always aim for the same best world, just with a different X value.
  Or at least prevent sign flip errors from causing something worse than paperclipping?
- Anirandis 16 Aug 2020 0:30 UTC
  2 points
  0
  Parent
  Interesting. Terrifying, but interesting.
  Forgive me for my stupidity (I’m not exactly an expert in machine learning), but it seems to me that building an AGI linked to some sort of database like that in such a fashion (that some random guy’s screw-up can effectively reverse the utility function completely) is a REALLY stupid idea. Would there not be a safer way of doing things?