Alexander Gietelink Oldenziel comments on A Defense of Work on Mathematical AI Safety

Alexander Gietelink Oldenziel 8 Jul 2023 2:08 UTC
6 points
3
It’s an interesting framing, Dan. Agent foundations for Quantum superintelligence. To me, motivation for Agent Foundations mostly comes from different considerations. Let me explain.
To my mind, agent foundations is not primarily about some mysterious future quantum superintelligences (though hopefully it will help us when they arrive!) - but about real agents in This world, TODAY.
That means humans and animals but also many systems that are agentic to a degree like markets, large organizations, Large Language models etc. One could call these pseudo-agents or pre-agents or egregores but at the moment there is no accepted terminology for not-quite-agents which may contribute to the persistent confusion that agent foundations is only concerned with expected utility maximizers.
The reason that so far research in Agent Foundations has mostly restricted itself to highly ideal ‘optimal’ agents is primarily because of mathematical tractability. Focusing on highly ideal agents also make sense from the point of view where we are focused on ‘reflectively stable agents’ i.e. we’d like to know what agents converge to upon-reflection. But primarily the reason we don’t much study more complicated, complex realistic models of real-life agents is that the mathematics simply isn’t there yet.
A different perspective on agent foundations is primarily that of deconfusion: we are at present confused about many of the key concepts of aligning future superintelligent agents. We need to be less confused.
Another point of view on the importance of Agent Foundations: Ultimately, it is inevitable that humanity will delegate more and more power to AIs. Ensuring the continued surviving and flourishing of the human species is then less about interpretability, more about engineering reflectively stable well-steered superintelligent systems. This is more about decision theory & (relatively) precise engineering, less about the online neuroscience of mechInterp. Perhaps this is what you meant by the waves of AI alignment.