Lukas Finnveden comments on The Dualist Predict-O-Matic ($100 prize)

Lukas Finnveden 17 Oct 2019 9:24 UTC
LW: 5 AF: 4
AF
If dualism holds for Abram’s prediction AI, the “Predict-O-Matic”, its world model may happen to include this thing called the Predict-O-Matic which seems to make accurate predictions—but it’s not special in any way and isn’t being modeled any differently than anything else in the world. Again, I think this is a pretty reasonable guess for the Predict-O-Matic’s default behavior. I suspect other behavior would require special code which attempts to pinpoint the Predict-O-Matic in its own world model and give it special treatment (an “ego”).
I don’t see why we should expect this. We’re told that the Predict-O-Matic is being trained with something like sgd, and sgd doesn’t really care about whether the model it’s implementing is dualist or non-dualist; it just tries to find a model that generates a lot of reward. In particular, this seems wrong to me:
The Predict-O-Matic doesn’t care about looking bad, and there’s nothing contradictory about it predicting that it won’t make the very prediction it makes, or something like that.
If the Predict-O-Matic has a model that makes bad prediction (i.e. looks bad), that model will be selected against. And if it accidentally stumbled upon a model that could correctly think about it’s own behaviour in a non-dualist fashion, and find fixed points, that model would be selected for (since its predictions come true). So at least in the limit of search and exploration, we should expect sgd to end up with a model that finds fixed points, if we train it in a situation where its predictions affect the future.
If we only train it on data where it can’t affect the data that it’s evaluated against, and then freeze the model, I agree that it probably won’t exhibit this kind of behaviour; is that the scenario that you’re thinking about?
- John_Maxwell 18 Oct 2019 5:26 UTC
  LW: 2 AF: 1
  AF Parent
  
  it just tries to find a model that generates a lot of reward
  
  SGD searches for a set of parameters which minimize a loss function. Selection, not control.
  
  If the Predict-O-Matic has a model that makes bad prediction (i.e. looks bad), that model will be selected against.
  
  Only if that info is included in the dataset that SGD is trying to minimize a loss function with respect to.
  
  And if it accidentally stumbled upon a model that could correctly think about it’s own behaviour in a non-dualist fashion, and find fixed points, that model would be selected for (since its predictions come true).
  
  Suppose we’re running SGD trying to find a model which minimizes the loss over a set of (situation, outcome) pairs. Suppose some of the situations are situations in which the Predict-O-Matic made a prediction, and that prediction turned out to be false. It’s conceivable that SGD could learn that the Predict-O-Matic predicting something makes it less likely to happen and use that as a feature. However, this wouldn’t be helpful because the Predict-O-Matic doesn’t know what prediction it will make at test time. At best it could infer that some of its older predictions will probably end up being false and use that fact to inform the thing it’s currently trying to predict.
  
  If we only train it on data where it can’t affect the data that it’s evaluated against, and then freeze the model, I agree that it probably won’t exhibit this kind of behaviour; is that the scenario that you’re thinking about?
  
  Not necessarily. The scenario I have in mind is the standard ML scenario where SGD is just trying to find some parameters which minimize a loss function which is supposed to approximate the predictive accuracy of those parameters. Then we use those parameters to make predictions. SGD isn’t concerned with future hypothetical rounds of SGD on future hypothetical datasets. In some sense, it’s not even concerned with predictive accuracy except insofar as training data happens to generalize to new data.
  
  If you think including historical observations of a Predict-O-Matic (which happens to be ‘oneself’) making bad (or good) predictions in the Predict-O-Matic’s training dataset will cause a catastrophe, that’s within the range of scenarios I care about, so please do explain!
  
  By the way, if anyone wants to understand the standard ML scenario more deeply, I recommend this class.
  - Lukas Finnveden 18 Oct 2019 10:58 UTC
    LW: 1 AF: 1
    AF Parent
    I think our disagreement comes from you imagining offline learning, while I’m imagining online learning. If we have a predefined set of (situation, outcome) pairs, then the Predict-O-Matic’s predictions obviously can’t affect the data that it’s evaluated against (the outcome), so I agree that it’ll end up pretty dualistic. But if we put a Predict-O-Matic in the real world, let it generate predictions, and then define the loss according to what happens afterwards, a non-dualistic Predict-O-Matic will be selected for over dualistic variants.
    If you still disagree with that, what do you think would happen (in the limit of infinite training time) with an algorithm that just made a random change proportional to how wrong it was, at every training step? Thinking about SGD is a bit complicated, since it calculates the gradient while assuming that the data stays constant, but if we use online training on an algorithm that just tries things until something works, I’m pretty confident that it’d end up looking for fixed points.
    - John_Maxwell 19 Oct 2019 5:07 UTC
      LW: 2 AF: 1
      AF Parent
      
      But if we put a Predict-O-Matic in the real world, let it generate predictions, and then define the loss according to what happens afterwards, a non-dualistic Predict-O-Matic will be selected for over dualistic variants.
      
      Yes, that sounds more like reinforcement learning. It is not the design I’m trying to point at in this post.
      
      If you still disagree with that, what do you think would happen (in the limit of infinite training time) with an algorithm that just made a random change proportional to how wrong it was, at every training step?
      
      That description sounds a lot like SGD. I think you’ll need to be crisper for me to see what you’re getting at.
      - Lukas Finnveden 22 Oct 2019 9:26 UTC
        LW: 1 AF: 1
        AF Parent
        Yes, that sounds more like reinforcement learning. It is not the design I’m trying to point at in this post.
        Ok, cool, that explains it. I guess the main differences between RL and online supervised learning is whether the model takes actions that can affect their environment or only makes predictions of fixed data; so it seems plausible that someone training the Predict-O-Matic like that would think they’re doing supervised learning, while they’re actually closer to RL.
        That description sounds a lot like SGD. I think you’ll need to be crisper for me to see what you’re getting at.
        No need, since we already found the point of disagreement. (But if you’re curious, the difference is that sgd makes a change in the direction of the gradient, and this one wouldn’t.)
        John_Maxwell 22 Oct 2019 23:33 UTC
        LW: 2 AF: 1
        AF Parent
        
        it seems plausible that someone training the Predict-O-Matic like that would think they’re doing supervised learning, while they’re actually closer to RL.
        
        How’s that?
        
        Lukas Finnveden 23 Oct 2019 9:45 UTC
        LW: 1 AF: 1
        AF Parent
        Assuming that people don’t think about the fact that Predict-O-Matic’s predictions can affect reality (which seems like it might have been true early on in the story, although it’s admittedly unlikely to be true for too long in the real world), they might decide to train it by letting it make predictions about the future (defining and backpropagating the loss once the future comes about). They might think that this is just like training on predefined data, but now the Predict-O-Matic can change the data that it’s evaluated against, so there might be any number of ‘correct’ answers (rather than exactly 1). Although it’s a blurry line, I’d say this makes it’s output more action-like and less prediction-like, so you could say that it makes the training process a bit more RL-like.
        John_Maxwell 23 Oct 2019 23:05 UTC
        LW: 2 AF: 1
        AF Parent
        I think it depends on internal details of the Predict-O-Matic’s prediction process. If it’s still using SGD, SGD is not going to play the future forward to see the new feedback mechanism you’ve described and incorporate it into the loss function which is being minimized. However, it’s conceivable that given a dataset about its own past predictions and how they turned out, the Predict-O-Matic might learn to make its predictions “more self-fulfilling” in order to minimize loss on that dataset?
        
        Lukas Finnveden 24 Oct 2019 10:44 UTC
        LW: 1 AF: 1
        AF Parent
        
        SGD is not going to play the future forward to see the new feedback mechanism you’ve described and incorporate it into the loss function which is being minimized
        
        My ‘new feedback mechanism’ is part of the training procedure. It’s not going to be good at that by ‘playing the future forward’, it’s going to become good at that by being trained on it.
        
        I suspect we’re using SGD in different ways, because everything we’ve talked about seems like it could be implemented with SGD. Do you agree that letting the Predict-O-Matic predict the future and rewarding it for being right, RL-style, would lead to it finding fixed points? Because you can definitely use SGD to do RL (first google result).
        
        John_Maxwell 26 Oct 2019 15:57 UTC
        LW: 2 AF: 1
        AF Parent
        
        I suspect we’re using SGD in different ways, because everything we’ve talked about seems like it could be implemented with SGD. Do you agree that letting the Predict-O-Matic predict the future and rewarding it for being right, RL-style, would lead to it finding fixed points? Because you can definitely use SGD to do RL (first google result).
        
        Fair enough, I was thinking about supervised learning.