RHollerith comments on G Gordon Worley III’s Shortform

RHollerith 17 May 2022 17:50 UTC
2 points
AF
Can you explain where there is an error term in AlphaGo or where an error term might appear in hypothetical model similar to AlphaGo trained much longer with much more numerous parameters and computational resources?
- Gordon Seidoh Worley 18 May 2022 0:14 UTC
  8 points
  AF Parent
  AlphaGo is fairly constrained in what it’s designed to optimize for, but it still has the standard failure mode of “things we forgot to encode”. So for example AlphaGo could suffer the error of instrumental power grabbing in order to be able to get better at winning Go because we misspecified what we asked it to measure. This is a kind of failure introduced into the systems by humans failing to make $m (X)$ adequately evaluate $X$ as we intended, since we cared about winning Go games while also minimizing side effects, but maybe when we constructed $m (X)$ we forgot about minimizing side effects.