Look, I agree re “negative of entropy, aging, dictators killing us eventually”, and a chance of positive outcome, but right now I think the balance is approximately like the above payoff matrix over the next 5-10 years, without a global moratorium (i.e. the positive outcome is very unlikely unless we take a decade or two to pause and think/work on alignment). I’d love to live in something akin to Iain M Banks’ culture, but we need to get through this acute risk period first, to stand any chance of that.
Do you think Drexler’s CAIS is straightforwardly controllable? Why? What’s to stop it being amalgamated into more powerful, less controllable systems? “People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
I’m worried that actually, Alignment might well turn out to be impossible. Maybe a moratorium will allow for such impossibility proofs to be established. What then?
“People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
Care to explain? The idea of open agency is we subdivide everything into short term, defined tasks that many AI can do and it is possible to compare notes.
AI systems are explicitly designed where it is difficult to know if they are even in the world or receiving canned training data. (This is explicitly true for gpt-4 for example, it is perfectly stateless and you can move the token input vector between nodes and fix the RNG seed and get the same answer each time)
This makes them highly reliable in the real world, whole anything else is less reliable, so...
The idea is that instead of helplessly waiting to die from other people’s misaligned AGI you beat them and build one you can control and use it to take the offensive when you have to. I suspect this may be the actual course of action surviving human worlds take. Your proposal is possibly certain death because ONLY people who care at all about ethics would consider delaying AGI. Making the unethical ones the ones who get it first for certain.
Kind of how spaying and neutering friendly pets reduces the gene pool for those positive traits.
Selection pressure will cause models to become agentic as they increase in power—those doing the agentic things (following universal instrumental goals like accumulating more resources and self-improvement) will outperform those that don’t. Mesaoptimisation (explainer video) is kind of like cheating—models that create inner optimisers that target something easier to get than what we meant, will be selected (by getting higher rewards) over models that don’t (because we won’t be aware of the inner misalignment). Evolution is a case in point—we are products of it, yet misaligned to its goals (we want sex, and high calorie foods, and money, rather than caring explicitly about inclusive genetic fitness). Without alignment being 100% watertight, powerful AIs will have completely alien goals.
Look, I agree re “negative of entropy, aging, dictators killing us eventually”, and a chance of positive outcome, but right now I think the balance is approximately like the above payoff matrix over the next 5-10 years, without a global moratorium (i.e. the positive outcome is very unlikely unless we take a decade or two to pause and think/work on alignment). I’d love to live in something akin to Iain M Banks’ culture, but we need to get through this acute risk period first, to stand any chance of that.
Do you think Drexler’s CAIS is straightforwardly controllable? Why? What’s to stop it being amalgamated into more powerful, less controllable systems? “People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
I’m worried that actually, Alignment might well turn out to be impossible. Maybe a moratorium will allow for such impossibility proofs to be established. What then?
“People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
Care to explain? The idea of open agency is we subdivide everything into short term, defined tasks that many AI can do and it is possible to compare notes.
AI systems are explicitly designed where it is difficult to know if they are even in the world or receiving canned training data. (This is explicitly true for gpt-4 for example, it is perfectly stateless and you can move the token input vector between nodes and fix the RNG seed and get the same answer each time)
This makes them highly reliable in the real world, whole anything else is less reliable, so...
The idea is that instead of helplessly waiting to die from other people’s misaligned AGI you beat them and build one you can control and use it to take the offensive when you have to. I suspect this may be the actual course of action surviving human worlds take. Your proposal is possibly certain death because ONLY people who care at all about ethics would consider delaying AGI. Making the unethical ones the ones who get it first for certain.
Kind of how spaying and neutering friendly pets reduces the gene pool for those positive traits.
Selection pressure will cause models to become agentic as they increase in power—those doing the agentic things (following universal instrumental goals like accumulating more resources and self-improvement) will outperform those that don’t. Mesaoptimisation (explainer video) is kind of like cheating—models that create inner optimisers that target something easier to get than what we meant, will be selected (by getting higher rewards) over models that don’t (because we won’t be aware of the inner misalignment). Evolution is a case in point—we are products of it, yet misaligned to its goals (we want sex, and high calorie foods, and money, rather than caring explicitly about inclusive genetic fitness). Without alignment being 100% watertight, powerful AIs will have completely alien goals.