the claim which gets clamped to True is not “this research direction will/can solve alignment” but instead “my research is high value”.
This agrees with something like half of my experience.
that their research is maybe a useful part of a bigger solution which involves many other parts, or that their research is maybe useful step toward something better.
Right, I think of this response as arguing that streetlighting is a good way to do large-scale pre-paradigm science projects in general. And I have to somewhat agree with that.
Then I argue that AGI alignment is somewhat exceptional: 1. cruel deadline, 2. requires understanding as-yet-unconceived aspects of Mind. Point 2 of exceptionality goes through things like alienness of creativity, RSI, reflective instability, the fact that we don’t understand how values sit in a mind, etc., and that’s the part that gets warped away.
I do genuinely think that the 2024 field of AI alignment would eventually solve the real problems via collective iterative streetlighting. (I even think it would eventually solve it in a hypothetical world where all our computers disappeared, if it kept trying.) I just think it’ll take a really long time.
being a useful part of a bigger solution (which they don’t know the details of) is itself a rather difficult design constraint which they have not at all done the work to satisfy
This agrees with something like half of my experience.
Right, I think of this response as arguing that streetlighting is a good way to do large-scale pre-paradigm science projects in general. And I have to somewhat agree with that.
Then I argue that AGI alignment is somewhat exceptional: 1. cruel deadline, 2. requires understanding as-yet-unconceived aspects of Mind. Point 2 of exceptionality goes through things like alienness of creativity, RSI, reflective instability, the fact that we don’t understand how values sit in a mind, etc., and that’s the part that gets warped away.
I do genuinely think that the 2024 field of AI alignment would eventually solve the real problems via collective iterative streetlighting. (I even think it would eventually solve it in a hypothetical world where all our computers disappeared, if it kept trying.) I just think it’ll take a really long time.
Right, exactly. (I wrote about this in my
opaque gibberishphilosophically precise style here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html#1-summary)