aysja comments on What’s next for the field of Agent Foundations?

aysja 2 Dec 2023 13:02 UTC
15 points
9
I’ve definitely also seen the failure mode where someone is only or too focused on “the puzzles of agency” without having an edge in linking those questions up with AI risk/alignment. Some ways of asking about/investigating agency are more and less relevant to alignment, so I think it’s important that there is a clear/strong enough “signal” from the target domain (here: AI risk/alignment) to guide the search/research directions
I disagree—I think that we need more people on the margin who are puzzling about agency, relative to those who are backchaining from a particular goal in alignment. Like you say elsewhere, we don’t yet know what abstractions make sense here; without knowing what the basic concepts of “agency” are it seems harmful to me to rely too much on top-down approaches, i.e., ones that assume something of an end goal.

In part that’s because I think we need higher variance conceptual bets here, and I think that over-emphasizing particular problems in alignment risks correlating people’s minds. In part it’s because I suspect that there are surprising, empirical things left to learn about agency that we’ll miss if we prefigure the problem space too much.

But also: many great scientific achievements have been preceded by bottom-up work (e.g., Shannon, Darwin, Faraday), and afaict their open-ended, curious explorations are what laid the groundwork for their later theories. I feel that it is a real mistake to hold all work to the same standards of legible feedback loops/backchained reasoning/clear path to impact/etc, given that so many great scientists did not follow this. Certainly, once we have a bit more of a foundation this sort of thing seems good to me (and good to do in abundance). But I think before we know what we’re even talking about, over-emphasizing narrow, concrete problems risks the wrong kind of conceptual research—the kind of “predictably irrelevant” work that Alexander gestures towards.