RogerDearnaley comments on How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaley 1 Dec 2023 5:51 UTC
3 points
3
I’ve seen a number of cases where something that helps alignment also helps capabilities, or vice versa, and also cases where people are worrying a lot about something as an alignment problem that looks to me like primarily a capabilities problem (so given how few alignment engineers we have, maybe we should leave solving it to all the capabilities engineers). Generally I think we’re just not very good at predicting the difference, and tend to want to see this as an either-or taboo rather than a spectrum buried inside a hard-to-anticipate tech tree. In general, capabilities folks also want to control their AI (so it won’t waste tokens, do weird stuff, or get them sued or indicted). The big cross-purposes concerns tend to come mostly from deceit, sharp left turn, and Foom scenarios, where capabilities seem just fine until we drive off the cliff. What I think we need (and even seems to be happening in many orgs, with a few unfortunate exceptions) is for all the capabilities engineers to be aware that alignment is also a challenge and needs to be thought about.
What links here?
- Noosphere89's comment on Joshua Achiam Public Statement Analysis by Zvi (10 Oct 2024 16:20 UTC; -2 points)