Thane Ruthenis comments on A Case for the Least Forgiving Take On Alignment

Thane Ruthenis 22 Feb 2024 19:11 UTC
2 points
0
The sort of thing that would change my mind: there’s some widespread phenomenon in machine learning that perplexes most, but is expected according to your model
My position is that there are many widespread phenomena in human cognition that are expected according to my model, and which can only be explained by the more mainstream ML models either if said models are contorted into weird shapes, or if they engage in denialism of said phenomena.
Again, the drive for consistent decision-making is a good example. Common-sensically, I don’t think we’d disagree that humans want their decisions to be consistent. They don’t want to engage in wild mood swings, they don’t want to oscillate wildly between which career they want to pursue or whom they want to marry: they want to figure out what they want and who they want to be with, and then act consistently with these goals in the long term. Even when they make allowances for changing their mind, they try to consistently optimize for making said allowances: for giving their future selves freedom/optionality/resources.
Yet it’s not something e. g. the Shard Theory would naturally predict out-of-the-box, last I checked. You’d need to add structures on top of it until it basically replicates my model (which is essentially how I arrived at my model, in fact – see this historical artefact).
- Prometheus 23 Feb 2024 4:23 UTC
  3 points
  0
  Parent
  “My position is that there are many widespread phenomena in human cognition that are expected according to my model, and which can only be explained by the more mainstream ML models either if said models are contorted into weird shapes, or if they engage in denialism of said phenomena.”
  Such as? I wouldn’t call Shard Theory mainstream, and I’m not saying mainstream models are correct either. On human’s trying to be consistent decision-makers, I have some theories about that (and some of which are probably wrong). But judging by how bad humans are at it, and how much they struggle to do it, they probably weren’t optimized too strongly biologically to do it. But memetically, developing ideas for consistent decision-making was probably useful, so we have software that makes use of our processing power to be better at this, even if the hardware is very stubborn at times. But even that isn’t optimized too hard toward coherence. Someone might prefer pizza to hot dogs, but they probably won’t always choose pizza over any other food, just because they want their preference ordering of food to be consistent. And, sure, maybe what they “truly” value is something like health, but I imagine even if they didn’t, they still wouldn’t do this.
  But all of this is still just one piece on the Jenga tower. And we could debate every piece in the tower, and even get 90% confidence that every piece is correct… but if there are more than 10 pieces on the tower, the whole thing is still probably going to come crashing down. (This is the part where I feel obligated to say, even though I shouldn’t have to, that your tower being wrong doesn’t mean “everything will be fine and we’ll be safe”, since the “everything will be fine” towers are looking pretty Jenga-ish too. I’m not saying we should just shrug our shoulders and embrace uncertainty. What I want is to build non-Jenga-ish towers)
  - Thane Ruthenis 23 Feb 2024 6:10 UTC
    2 points
    0
    Parent
    I wouldn’t call Shard Theory mainstream
    Fair. What would you call a “mainstream ML theory of cognition”, though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis^[1]).
    judging by how bad humans are at [consistent decision-making], and how much they struggle to do it, they probably weren’t optimized too strongly biologically to do it. But memetically, developing ideas for consistent decision-making was probably useful, so we have software that makes use of our processing power to be better at this
    Roughly agree, yeah.
    But all of this is still just one piece on the Jenga tower
    I kinda want to push back against this repeat characterization – I think quite a lot of my model’s features are “one storey tall”, actually – but it probably won’t be a very productive use of the time of either of us. I’ll get around to the “find papers empirically demonstrating various features of my model in humans” project at some point; that should be a more decent starting point for discussion.
    What I want is to build non-Jenga-ish towers
    Agreed. Working on it.
    ^
    Which, yeah, I think is false: scaling LLMs won’t get you to AGI. But it’s also kinda unfalsifiable using empirical methods, since you can always claim that another 10x scale-up will get you there.
    - Prometheus 23 Feb 2024 19:15 UTC
      3 points
      0
      Parent
      Fair. What would you call a “mainstream ML theory of cognition”, though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis
      It tends not to get talked about much today, but there was the PDP (connectionist) camp of cognition vs. the camp of “everything else” (including ideas such as symbolic reasoning, etc). The connectionist camp created a rough model of how they thought cognition worked, a lot of cognitive scientists scoffed at it, Hinton tried putting it into actual practice, but it took several decades for it to be demonstrated to actually work. I think a lot of people were confused by why the “stack more layers” approach kept working, but under the model of connectionism, this is expected. Connectionism is kind of too general to make great predictions, but it doesn’t seem to allow for FOOM-type scenarios. It also seems to favor agents as local optima satisficers, instead of greedy utility maximizers.