Future foundation models are unlikely to cleanly conform to the agent archetype
Simulators/foundation models may be the first (and potentially final) form of transformative AI; salvation/catastrophe may be realised without the emergence of superhuman general agents
I worry that foundational theory that conditions too strongly on the agent ontology may end up being of limited applicability to the intelligent systems that would determine the longterm future of earth originating civilisation
I am persuaded by/sympathetic to composite modular architectures for transformative AI such as Drexler’s “Open Agency” or “Comprehensive AI Services”
I believe that cognition can be largely disconnected from volition
And that in principle, augmented/amplified humans (“cyborgs”) can be competitive with superhuman agents
I find the concept of an aligned sovereign (agent) very uncompelling. It runs counter to my values regarding autonomy and self actualisation. Human enfeeblement would be a tragedy in its own right.
Research paradigms/approaches I see as belonging to “Foundations of Intelligent Systems”:
“Foundations of Intelligent Systems” not “Agent Foundations”
I don’t like the term “agent foundations” to describe the kind of research I am most interested in, because:
I am unconvinced that “agent” is the “true name” of the artifacts that would determine the shape of humanity’s long term future
The most powerful artificial intelligent systems today do not cleanly fit into the agent ontology,
Future foundation models are unlikely to cleanly conform to the agent archetype
Simulators/foundation models may be the first (and potentially final) form of transformative AI; salvation/catastrophe may be realised without the emergence of superhuman general agents
I worry that foundational theory that conditions too strongly on the agent ontology may end up being of limited applicability to the intelligent systems that would determine the longterm future of earth originating civilisation
I am persuaded by/sympathetic to composite modular architectures for transformative AI such as Drexler’s “Open Agency” or “Comprehensive AI Services”
I do not want us to build generally capable strongly autonomous agents; my best case outcome for AI is for AI to amplify human cognition and capabilities, not to replace/supersede us.
I believe that cognition can be largely disconnected from volition
And that in principle, augmented/amplified humans (“cyborgs”) can be competitive with superhuman agents
I find the concept of an aligned sovereign (agent) very uncompelling. It runs counter to my values regarding autonomy and self actualisation. Human enfeeblement would be a tragedy in its own right.
Research paradigms/approaches I see as belonging to “Foundations of Intelligent Systems”:
Quintin Pope and Alex Turner’s Shard Theory
John Wentworth’s “Basic Foundations for Agent Models”
John Wentworth’s “Natural Abstractions”
John Wentworth’s “Selection Theorems”
Andrew Critch’s “Boundaries”
Deepmind’s “Discovering Agents”
(Ambitious) Mechanistic Interpretability
Other science of deep learning approaches
Garrabrant and Demski’s Embedded Agency
MIRI’s Highly Reliable Agent Design
Please suggest more!
As always let me know if you want me to publish this as a top level post.