Gotcha. Either way, I think this is a great idea for a thread, and I appreciate you making it. :)
To avoid confusion, when I say “agent foundations” I mean one of these things:
Work that’s oriented toward the original “Agent Foundations” agenda, which put a large focus on “highly reliable agent design” (usually broken up into logical uncertainty and naturalized induction, decision theory, and Vingean reflection), and also tends to apply an HRAD-informed perspective to understanding things like corrigibility and value learning.
Work that’s oriented toward the “Embedded Agency” confusions, which are mostly the same as the original “Agent Foundations” agenda plus subsystem alignment.
We originally introduced the term “agent foundations” because (a) some people (I think Stuart Russell?) thought it was a better way of signposting the kind of alignment research we were doing, and (b) we wanted to distinguish our original research agenda from the 2016 “Alignment for Advanced Machine Learning Systems” agenda (AAMLS).
A better term might have been “agency foundations,” since you almost certainly don’t want your first AGI systems to be “agentic” in every sense of the word, but you do want to fundamentally understand the components of agency (good reasoning, planning, self-modeling, optimization, etc.). The idea is to understand how agency works, but not to actually build a non-task-directed, open-ended optimizer (until you’ve gotten a lot of practice with more limited, easier-to-align AGI systems).
Gotcha. Either way, I think this is a great idea for a thread, and I appreciate you making it. :)
To avoid confusion, when I say “agent foundations” I mean one of these things:
Work that’s oriented toward the original “Agent Foundations” agenda, which put a large focus on “highly reliable agent design” (usually broken up into logical uncertainty and naturalized induction, decision theory, and Vingean reflection), and also tends to apply an HRAD-informed perspective to understanding things like corrigibility and value learning.
Work that’s oriented toward the “Embedded Agency” confusions, which are mostly the same as the original “Agent Foundations” agenda plus subsystem alignment.
We originally introduced the term “agent foundations” because (a) some people (I think Stuart Russell?) thought it was a better way of signposting the kind of alignment research we were doing, and (b) we wanted to distinguish our original research agenda from the 2016 “Alignment for Advanced Machine Learning Systems” agenda (AAMLS).
A better term might have been “agency foundations,” since you almost certainly don’t want your first AGI systems to be “agentic” in every sense of the word, but you do want to fundamentally understand the components of agency (good reasoning, planning, self-modeling, optimization, etc.). The idea is to understand how agency works, but not to actually build a non-task-directed, open-ended optimizer (until you’ve gotten a lot of practice with more limited, easier-to-align AGI systems).