james.lucassen comments on Safety Implications of LeCun’s path to machine intelligence

james.lucassen 15 Jul 2022 23:27 UTC
4 points
3
In general, I’m a bit unsure about how much of an interpretability advantage we get from slicing the model up into chunks. If the pieces are trained separately, then we can reason about each part individually based on its training procedure. In the optimistic scenario, this means that the computation happening in the part of the system labeled “world model” is actually something humans would call world modelling. This is definitely helpful for interpretability. But the alternative possibility is that we get one or more mesa-optimizers, which seems less interpretable.
- Steven Byrnes 18 Jul 2022 16:19 UTC
  4 points
  0
  Parent
  I for one am moderately optimistic that the world-model can actually remain “just” a world-model (and not a secret deceptive world-optimizer), and that the value function can actually remain “just” a value function (and not a secret deceptive world-optimizer), and so on, for reasons in my post Thoughts on safety in predictive learning—particularly the idea that the world-model data structure / algorithm can be relatively narrowly tailored to being a world-model, and the value function data structure / algorithm can be relatively narrowly tailored to being a value function, etc.
- Evan R. Murphy 16 Jul 2022 6:28 UTC
  2 points
  −1
  Parent
  Since LeCun’s architecture is together a kind of optimizer (I agree with Algon that it’s probably a utility maximizer) then the emergence of additional mesa-optimizers seems less likely.
  
  We expect optimization to emerge because it’s a powerful algorithm for SGD to stumble on that outcompetes the alternatives. But if the system is already an optimizer, then where is that selection pressure coming from to make another one?
  - the gears to ascension 17 Jul 2022 4:52 UTC
    1 point
    0
    Parent
    it’s coming from the fact that every module wants to be an optimizer of something in order to do its job
    - Evan R. Murphy 17 Jul 2022 6:01 UTC
      1 point
      0
      Parent
      Interesting, I wonder how the dynamics of a multiple mesa-optimizer system would play out (if it’s possible).