David Reber comments on EIS V: Blind Spots In AI Safety Interpretability Research

David Reber 17 Feb 2023 16:08 UTC
1 point
2
AF
Strongly upvoting this for being a thorough and carefully cited explanation of how the safety/alignment community doesn’t engage enough with relevant literature from the broader field, likely at the cost of reduplicated work, suboptimal research directions, and less exchange and diffusion of important safety-relevant ideas
Ditto. I’ve recently started moving into interpretability / explainability and spent the past week skimming the broader literature on XAI, so the timing of this carefully cited post is quite impactful for me.
I see similar things happening with causality generally, where it seems to me that (as a 1st order heuristic) much of alignment forum’s reference for causality is frozen at Pearl’s 2008 textbook, missing what I consider to be most of the valuable recent contributions and expansions in the field.
- Example: Finite Factored Sets seems to be reinventing causal representation learning [for a good intro, see Schölkopf 2021], where it seems to me that the broader field is outpacing FFS on its own goals. FFS promises some theoretical gains (apparently to infer causality where Pearl-esque frameworks can’t) but I’m no longer as sure about the validity of this.
- Counterexample(s): the Causal Incentives Working Group, and David Krueger’s lab, for instance. Notably these are embedded in academia, where there’s more culture (incentive) to thoroughly relate to previous work. (These aren’t the only ones, just 2 that came to mind.)
- Alexander Gietelink Oldenziel 4 May 2023 23:21 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I was intrigued by your claim that FFS is already subsumed by work on academia. I clicked the link you provided but from a quick skim it doesn’t seem to do FFS or anything beyond the usual pearl causality story as far as I can tell. Maybe I am missing something—could you provide an specific page where you think FFS is being subsumed?
  - David Reber 25 May 2023 16:49 UTC
    1 point
    0
    AF Parent
    Also, just to make sure we share a common understanding of Schölkopf 2021: Wouldn’t you agree that asking “how do we do causality when we don’t even know what level abstraction on which to define causal variables?” is beyond the “usual pearl causality story” as usually summarized in FFS posts? It certainly goes beyond Pearl’s well-known works.
  - David Reber 25 May 2023 16:45 UTC
    1 point
    0
    AF Parent
    I don’t think my claim is that “FFS is already subsumed by work in academia”: as I acknowledge, FFS is a different theoretical framework than Pearl-based causality. I view them as two distinct approaches, but my claim is that they are motivated by the same question (that is, how to do causal representation learning).
    It was intentional that the linked paper is an intro survey paper to the Pearl-ish approach to causal rep. learning: I mean to indicate that there are already lots of academic researchers studying the question “what does it mean to study causality if we don’t have pre-defined variables?”
    It may be that FFS ends up contributing novel insights above and beyond <Pearl-based causal rep. learning>, but a priori I expect this to occur only if FFS researchers are familiar with the existing literature, which I haven’t seen mentioned in any FFS posts.
    My line of thinking is: It’s hard to improve on a field you aren’t familiar with. If you’re ignorant of the work of hundreds of other researchers who are trying to answer the same underlying question you are, odds are against your insights being novel / neglected.
    - Alexander Gietelink Oldenziel 25 May 2023 20:12 UTC
      2 points
      0
      Parent
      Scott Garrabrant conceived of FFS as an extension & generalization of Pearlian causality that answers questions that are not dealt well with in the Pearlian framework. He is aware of Pearl’s work and explicitly builds on it. It’s not a distinct approach as much as an extension. The paper you mentioned discusses the problem of figuring out what the right variables are but poses no solution (as far as I can tell). That shouldn’t surprise because the problem is very hard. Many people have thought about it but there is only one Garrabrant.
      
      I do agree with your overall perspective that people in alignment are quite insular, unaware of the literature and often reinventing the wheel.