TsviBT comments on The Field of AI Alignment: A Postmortem, and What To Do About It

TsviBT 29 Dec 2024 1:54 UTC
8 points
2

the claim which gets clamped to True is not “this research direction will/can solve alignment” but instead “my research is high value”.

This agrees with something like half of my experience.

that their research is maybe a useful part of a bigger solution which involves many other parts, or that their research is maybe useful step toward something better.

Right, I think of this response as arguing that streetlighting is a good way to do large-scale pre-paradigm science projects in general. And I have to somewhat agree with that.

Then I argue that AGI alignment is somewhat exceptional: 1. cruel deadline, 2. requires understanding as-yet-unconceived aspects of Mind. Point 2 of exceptionality goes through things like alienness of creativity, RSI, reflective instability, the fact that we don’t understand how values sit in a mind, etc., and that’s the part that gets warped away.

I do genuinely think that the 2024 field of AI alignment would eventually solve the real problems via collective iterative streetlighting. (I even think it would eventually solve it in a hypothetical world where all our computers disappeared, if it kept trying.) I just think it’ll take a really long time.

being a useful part of a bigger solution (which they don’t know the details of) is itself a rather difficult design constraint which they have not at all done the work to satisfy

Right, exactly. (I wrote about this in my ~~opaque gibberish~~ philosophically precise style here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html#1-summary)