Nora Belrose comments on Counting arguments provide no evidence for AI doom

Nora Belrose 28 Feb 2024 2:45 UTC
13 points
−1
Some incomplete brief replies:
Huemer… indeed seems confused about all sorts of things
Sure, I was just searching for professional philosopher takes on the indifference principle, and that chapter in Paradox Lost was among the first things I found.
Separately, “reductionism as a general philosophical thesis” does not imply the thing you call “goal reductionism”
Did you see the footnote I wrote on this? I give a further argument for it.
doesn’t mean the end-to-end trained system will turn out non-modular.
I looked into modularity for a bit 1.5 years ago and concluded that the concept is way too vague and seemed useless for alignment or interpretability purposes. If you have a good definition I’m open to hearing it.
There are good reasons behaviorism was abandoned in psychology, and I expect those reasons carry over to LLMs.
To me it looks like people abandoned behaviorism for pretty bad reasons. The ongoing replication crisis in psychology does not inspire confidence in that field’s ability to correctly diagnose bullshit.
That said, I don’t think my views depend on behaviorism being the best framework for human psychology. The case for behaviorism in the AI case is much, much stronger: the equations for an algorithm like REINFORCE or DPO directly push up the probability of some actions and push down the probability of others.
- johnswentworth 28 Feb 2024 3:41 UTC
  12 points
  3
  Parent
  Did you see the footnote I wrote on this? I give a further argument for it.
  Ah yeah, I indeed missed that the first time through. I’d still say I don’t buy it, but that’s a more complicated discussion, and it is at least a decent argument.
  I looked into modularity for a bit 1.5 years ago and concluded that the concept is way too vague and seemed useless for alignment or interpretability purposes. If you have a good definition I’m open to hearing it.
  This is another place where I’d say we don’t understand it well enough to give a good formal definition or operationalization yet.
  Though I’d note here, and also above w.r.t. search, that “we don’t know how to give a good formal definition yet” is very different from “there is no good formal definition” or “the underlying intuitive concept is confused” or “we can’t effectively study the concept at all” or “arguments which rely on this concept are necessarily wrong/uninformative”. Every scientific field was pre-formal/pre-paradigmatic once.
  To me it looks like people abandoned behaviorism for pretty bad reasons. The ongoing replication crisis in psychology does not inspire confidence in that field’s ability to correctly diagnose bullshit.
  That said, I don’t think my views depend on behaviorism being the best framework for human psychology. The case for behaviorism in the AI case is much, much stronger: the equations for an algorithm like REINFORCE or DPO directly push up the probability of some actions and push down the probability of others.
  Man, that is one hell of a bullet to bite. Much kudos for intellectual bravery and chutzpah!
  That might be a fun topic for a longer discussion at some point, though not right now.