Rubi J. Hudson comments on Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural

Rubi J. Hudson 18 Jul 2024 0:39 UTC
LW: 1 AF: 1
−1
AF
I agree that in theory uncertainty about the goal is helpful. However, the true main goal has to be under consideration, otherwise resisting modification to add it is beneficial for all goals that are. How to ensure the true goal is included seems like a very difficult open problem.
- RogerDearnaley 22 Jul 2024 23:58 UTC
  2 points
  0
  Parent
  That’s not necessarily required. The Scientific Method works even if the true “Unified Field Theory” isn’t yet under consideration, merely some theories that are closer to it and others further away from it: it’s possible to make iterative progress.
  
  In practice, considered as search processes, the Scientific Method, Bayesianism, and stochastic gradient descent all tend to find similar answers: yet unlike Bayesianism gradient descent doesn’t explicitly consider every point in the space including the true optimum, it just searches for nearby better points. It can of course get trapped in local minima: Singular Learning Theory highilights why that’s less of a problem in practice than it sounds in theory.
  
  The important question here is how good an approximation the search algorithm in use is to Bayesianism. As long as the AI understands that what it’s doing is (like the scientific method and stochastic gradient descent) a computationally efficient approximation to the computationally intractable ideal of Bayesianism, then it won’t resist the process of coming up with new possibly-better hypotheses, it will instead regard that as a necessary part of the process (like hypothesis creation in the scientific method, the mutational/crossing steps in an evolutionary algorithm, or the stochastic batch noise in stochastic gradient descent).
  - Rubi J. Hudson 24 Jul 2024 5:29 UTC
    1 point
    0
    Parent
    None of that is wrong, but it misses the main issue with corrigibility, which is that the approximation resists further refinement. That’s why for it to work, the correct utility function would need to start in the ensemble.