Martín Soto comments on Martín Soto’s Shortform

Martín Soto 28 Jul 2024 23:07 UTC
19 points
2
The default explanation I’d heard for “the human brain naturally focusing on negative considerations”, or “the human body experiencing more pain than pleasure”, was that, in the ancestral environment, there were many catastrophic events to run away from, but not many incredibly positive events to run towards: having sex once is not as good as dying is bad (for inclusive genetic fitness).
But maybe there’s another, more general factor, that doesn’t rely on these environment details but rather deeper mathematical properties:
Say you are an algorithm being constantly tweaked by a learning process.
Say on input X you produce output (action) Y, leading to a good outcome (meaning, one of the outcomes the learning process likes, whatever that means). Sure, the learning process can tweak your algorithm in some way to ensure that X → Y is even more likely in the future. But even if it doesn’t, by default, next time you receive input X you will still produce Y (since the learning algorithm hasn’t changed you, and ignoring noise). You are, in some sense, already taking the correct action (or at least, an acceptably correct one).
Say on input X’ you produce output Y’, leading instead to a bad outcome. If the learning process changes nothing, next time you find X’ you’ll do the same. So the process really needs to change your algorithm right now, and can’t fall back on your existing default behavior.
Of course, many other factors make it the case that such a naive story isn’t the full picture:
- Maybe there’s noise, so it’s not guaranteed you behave the same on every input.
- Maybe the negative tweaks make the positive behavior (on other inputs) slowly wither away (like circuit rewriting in neural networks), so you need to reinforce positive behavior for it to stick.
- Maybe the learning algorithm doesn’t have a clear notion of “positive and negative”, and instead just provides in a same direction (but with different intensities) for different intensities in a scale without origin. (But this seems ~~very different from the current paradigm,~~ and fundamentally wasteful.)
Still, I think I’m pointing at something real, like “on average across environments punishing failures is more valuable than reinforcing successes”.
- Vanessa Kosoy 29 Jul 2024 17:11 UTC
  6 points
  1
  Parent
  Maybe the learning algorithm doesn’t have a clear notion of “positive and negative”, and instead just provides in a same direction (but with different intensities) for different intensities in a scale without origin. (But this seems very different from the current paradigm, and fundamentally wasteful.)
  Maybe I don’t understand your intent, but isn’t this exactly the currently paradigm? You train a network using the derivative of the loss function. Adding a constant to the loss function changes nothing. So, I don’t see how it’s possible to have a purely ML-based explanation of where humans consider the “origin” to be.
  - Martín Soto 29 Jul 2024 23:05 UTC
    3 points
    0
    Parent
    You’re right! I had mistaken the derivative for the original function.
    Probably this slip happened because I was also thinking of the following:
    Embedded learning can’t ever be modelled as taking such an (origin-agnostic) derivative.
    When in ML we take the gradient in the loss landscape, we are literally taking (or approximating) a counterfactual: “If my algorithm was a bit more like this, would I have performed better in this environment? (For example, would my prediction have been closer to the real next token)”
    But in embedded reality there’s no way to take this counterfactual: You just have your past and present observations, and you don’t necessarily know whether you’d have obtained more or less reward had you moved your hand a bit more like this (taking the fruit to your mouth) or like that (moving it away).
    Of course, one way to solve this is to learn a reward model inside your brain, which can learn without any counterfactuals (just considering whether the prediction was correct, or how “close” it was for some definition of close). And then another part of the brain is trained to approximate argmaxing the reward model.
    But another effect, that I’d also expect to happen, is that (either through this reward model or other means) the brain learns a “baseline of reward” (the “origin”) based on past levels of dopamine or whatever, and then reinforces things that go over that baseline, and disincentivizes those that go below (also proportionally to how far they are from the baseline). Basically the hedonic treadmill. I also think there’s some a priori argument for this helping with computational frugality, in case you change environments (and start receiving much more or much less reward).
    - Vanessa Kosoy 1 Aug 2024 8:39 UTC
      2 points
      0
      Parent
      I don’t think embeddedness has much to do with it. And I disagree that it’s incompatible with counterfactuals. For example, infra-Bayesian physicalism is fully embedded and has a notion of counterfactuals. I expect any reasonable alternative to have them as well.
- Seth Herd 29 Jul 2024 4:02 UTC
  4 points
  2
  Parent
  It’s an interesting point. OTOH, your first two counterpoints are clearly true; there’s immense “noise” in natural environments; no two situations come close to repeating, so doing the right thing once doesn’t remotely ensure doing it again. But the trend was in the right direction, so your point stands at a reduced strength.
  
  Negative tweaks definitely wither away the positive behavior; overwriting behavior is the nature of networks, although how strongly this applies is a variable. I don’t know how experiments have shown this to occur; it’s always going to be specific to overlap in circumstances.
  
  Your final counterpoint almost certainly isn’t true in human biology/learning. There’s a zero point on the scale, which is no net change in dopamine release. That happens when results match the expected outcome. Dopamine directly drives learning, although in somewhat complex ways in different brain regions. The basal ganglia system appears to perform RL much like many ML systems, while the cortex appears to do something related but of learning more about whatever happened just before dopamine release, but not learning to perform a specific action as such.
  
  But it’s also definitely true that death is much worse than any single positive event (for humans), since you can’t be sure of raising a child to adulthood just by having sex once. The most important thing is to stay in the game.
  
  So both are factors.
  
  But observe the effect of potential sex on adolescent males, and I think we’ll see that the risk of death isn’t all that much stronger an influence ;)