The lack of a robust, highly general paradigm for reasoning about AGI models is the current greatest technical problem, although it is not what most people are working on.
What features of architecture of contemporary AI models will occur in future models that pose an existential risk?
What behavioral patterns of contemporary AI models will be shared with future models that pose an existential risk?
Is there a useful and general mathematical/physical framework that describes how agentic, macroscropic systems process information and interact with the environment?
Does terminology adopted by AI Safety researchers like “scheming”, “inner alignment” or “agent” carve nature at the joints?
The lack of a robust, highly general paradigm for reasoning about AGI models is the current greatest technical problem, although it is not what most people are working on.
What features of architecture of contemporary AI models will occur in future models that pose an existential risk?
What behavioral patterns of contemporary AI models will be shared with future models that pose an existential risk?
Is there a useful and general mathematical/physical framework that describes how agentic, macroscropic systems process information and interact with the environment?
Does terminology adopted by AI Safety researchers like “scheming”, “inner alignment” or “agent” carve nature at the joints?