Some thoughts on the post as I want to collect everything.
For this specifically:
The field of alignment seems to be increasingly dominated by interpretability. (and obedience[2])
Obedience is quite literally the classical version of the AI alignment problem, or at least very entangled with it, and interpretability IMO got a lot of the initial boost from Distill, combined with it being a relatively clean problem formulation that makes alignment more scalable.
On footnote 3, I have to point out that the postulated sharp left turn in humans compared to chimps doesn’t nearly have as much evidence as the model in which human success is basically attributable to culture, where we are very good at imitating others without a true causal model as well as distilling cultural knowledge down, combined with us having essentially a very good bodyplan for tool use and us being able to dissipate heat better to scale up algorithms well.
I don’t totally like Henrich’s book The Secret of Our Success, but I do think that at least some of the results like chimps essentially equaling humans on IQ and only losing hard when social competence was required is a surprising fact under the sharp left turn view.
While the science of human and animal brains is incomplete at this juncture, we are starting to realize that animals do in fact have much more intelligence than past people realized, and in particular for mammals, their algorithmic cores are also fairly similar in genetic space, and the only real differentiator at this point for humans is just the fact that our cultural knowledge sparked by population growth and language booms every generation.
Some thoughts on the post as I want to collect everything.
For this specifically:
Obedience is quite literally the classical version of the AI alignment problem, or at least very entangled with it, and interpretability IMO got a lot of the initial boost from Distill, combined with it being a relatively clean problem formulation that makes alignment more scalable.
On footnote 3, I have to point out that the postulated sharp left turn in humans compared to chimps doesn’t nearly have as much evidence as the model in which human success is basically attributable to culture, where we are very good at imitating others without a true causal model as well as distilling cultural knowledge down, combined with us having essentially a very good bodyplan for tool use and us being able to dissipate heat better to scale up algorithms well.
I don’t totally like Henrich’s book The Secret of Our Success, but I do think that at least some of the results like chimps essentially equaling humans on IQ and only losing hard when social competence was required is a surprising fact under the sharp left turn view.
While the science of human and animal brains is incomplete at this juncture, we are starting to realize that animals do in fact have much more intelligence than past people realized, and in particular for mammals, their algorithmic cores are also fairly similar in genetic space, and the only real differentiator at this point for humans is just the fact that our cultural knowledge sparked by population growth and language booms every generation.