Buck comments on How might we align transformative AI if it’s developed very soon?

Buck 15 Jan 2024 2:09 UTC
LW: 16 AF: 12
4
AF
I think this post was quite helpful. I think it does a good job laying out a fairly complete picture of a pretty reasonable safety plan, and the main sources of difficulty. I basically agree with most of the points. Along the way, it makes various helpful points, for example introducing the “action risk vs inaction risk” frame, which I use constantly. This post is probably one of the first ten posts I’d send someone on the topic of “the current state of AI safety technology”.
I think that I somewhat prefer the version of these arguments that I give in e.g. this talk and other posts.
My main objection to the post is the section about decoding and manipulating internal states; I don’t think that anything that I’d call “digital neuroscience” would be a major part of ensuring safety if we had to do so right now.
In general, I think this post is kind of sloppy about distinguishing between control-based and alignment-based approaches to making usage of a particular AI safe, and this makes its points weaker.