johnswentworth comments on The Field of AI Alignment: A Postmortem, and What To Do About It

johnswentworth Dec 28, 2024, 11:09 PM
9 points
0
My impression, from conversations with many people, is that the claim which gets clamped to True is not “this research direction will/can solve alignment” but instead “my research is high value”. So when I’ve explained to someone why their current direction is utterly insufficient, they usually won’t deny some class of problems. They’ll instead tell me that the research still seems valuable even though it isn’t addressing a bottleneck, or that their research is maybe a useful part of a bigger solution which involves many other parts, or that their research is maybe useful step toward something better.
(Though admittedly I usually try to “meet people where they’re at”, by presenting failure-modes which won’t parse as weird to them. If you’re just directly explaining e.g. dangers of internal RSI, I can see where people might instead just assume away internal RSI or some such.)
… and then if I were really putting in effort, I’d need to explain that e.g. being a useful part of a bigger solution (which they don’t know the details of) is itself a rather difficult design constraint which they have not at all done the work to satisfy. But usually I wrap up the discussion well before that point; I generally expect that at most one big takeaway from a discussion can stick, and if they already have one then I don’t want to overdo it.
- TsviBT Dec 29, 2024, 1:54 AM
  8 points
  2
  Parent
  
  the claim which gets clamped to True is not “this research direction will/can solve alignment” but instead “my research is high value”.
  
  This agrees with something like half of my experience.
  
  that their research is maybe a useful part of a bigger solution which involves many other parts, or that their research is maybe useful step toward something better.
  
  Right, I think of this response as arguing that streetlighting is a good way to do large-scale pre-paradigm science projects in general. And I have to somewhat agree with that.
  
  Then I argue that AGI alignment is somewhat exceptional: 1. cruel deadline, 2. requires understanding as-yet-unconceived aspects of Mind. Point 2 of exceptionality goes through things like alienness of creativity, RSI, reflective instability, the fact that we don’t understand how values sit in a mind, etc., and that’s the part that gets warped away.
  
  I do genuinely think that the 2024 field of AI alignment would eventually solve the real problems via collective iterative streetlighting. (I even think it would eventually solve it in a hypothetical world where all our computers disappeared, if it kept trying.) I just think it’ll take a really long time.
  
  being a useful part of a bigger solution (which they don’t know the details of) is itself a rather difficult design constraint which they have not at all done the work to satisfy
  
  Right, exactly. (I wrote about this in my ~~opaque gibberish~~ philosophically precise style here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html#1-summary)