RogerDearnaley comments on How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaley 29 Nov 2023 20:25 UTC
3 points
0
FWIW, I’m a Staff ML SWE, interested in switching to research engineering, and I’d love to make these things happen — either at a superscaler with ample of resources for it, or failing that, at something like Eleuther or an alignment research lab.
- porby 30 Nov 2023 18:44 UTC
  5 points
  1
  Parent
  I think that’d be great!
  Some of this stuff technically accelerates capabilities (or more specifically, the elicitation of existing capabilities), but I think it also belongs to a more fundamentally reliable path on the tech tree. The sooner the industry embraces it, the less time they spend in other parts of the tech tree that are more prone to misoptimization failures, and the less likely it is that someone figures out how to make those misoptimization failures way more efficient.
  I suspect there’s a crux about the path of capabilities development in there for a lot of people; I should probably get around to writing a post about the details at some point.
  - RogerDearnaley 1 Dec 2023 5:51 UTC
    3 points
    3
    Parent
    I’ve seen a number of cases where something that helps alignment also helps capabilities, or vice versa, and also cases where people are worrying a lot about something as an alignment problem that looks to me like primarily a capabilities problem (so given how few alignment engineers we have, maybe we should leave solving it to all the capabilities engineers). Generally I think we’re just not very good at predicting the difference, and tend to want to see this as an either-or taboo rather than a spectrum buried inside a hard-to-anticipate tech tree. In general, capabilities folks also want to control their AI (so it won’t waste tokens, do weird stuff, or get them sued or indicted). The big cross-purposes concerns tend to come mostly from deceit, sharp left turn, and Foom scenarios, where capabilities seem just fine until we drive off the cliff. What I think we need (and even seems to be happening in many orgs, with a few unfortunate exceptions) is for all the capabilities engineers to be aware that alignment is also a challenge and needs to be thought about.
    What links here?
    Noosphere89's comment on Joshua Achiam Public Statement Analysis by Zvi (10 Oct 2024 16:20 UTC; -2 points)