Thomas Kwa comments on What’s next for the field of Agent Foundations?

Thomas Kwa 30 May 2024 7:55 UTC
14 points
5
The title of this dialogue promised a lot, but I’m honestly a bit disappointed by the content. It feels like the authors are discussing exactly how to run particular mentorship programs and structure grants and how research works in full generality, while no one is actually looking at the technical problems. All field-building efforts must depend on the importance and tractability of technical problems, and this is just as true when the field is still developing a paradigm. I think a paradigm is established only when researchers with many viewpoints build a sense of which problems are important, then try many approaches until one successfully solves many such problems, thus proving the value of said approach. Wanting to find new researchers to have totally new takes and start totally new illegible research agendas is a level of helplessness that I think is unwarranted—how can one be interested in AF without some view on what problems are interesting?
I would be excited about a dialogue that goes like this, though the format need not be rigid:
- What are the most important [1] problems in agent foundations, with as much specificity as possible?
  - Responses could include things like:
    A sound notion of “goals with limited scope”: can’t nail down precise desiderata now, but humans have these all the time, we don’t know what they are, and they could be useful in corrigibility or impact measures.
    Finding a mathematical model for agents that satisfies properties of logical inductors but also various other desiderata
    Further study of corrigibility and capability of agents with incomplete preferences
  - Participants discuss how much each problem scratches their itch of curiosity about what agents are.
- What techniques have shown promise in solving these and other important problems?
  - Does [infra-Bayes, Demski’s frames on embedded agents, some informal ‘shard theory’ thing, …] have a good success to complexity ratio?
    probably none of them do?
- What problems would benefit the most from people with [ML, neuroscience, category theory, …] expertise?
[1]: (in the Hamming sense that includes tractability)
- Alexander Gietelink Oldenziel 30 May 2024 10:01 UTC
  2 points
  0
  Parent
  You may be positively surprised to know I agree with you. :)
  For context, the dialogue feature just came out on LW. We gave it a try and this was the result. I think we mostly concluded that the dialogue feature wasn’t quite worth the effort. Anyway
  I like what you’re suggesting and would be open to do a dialogue about it !