I’m starting to suspect that AGI might require decision theoretic insights about reflection in order to be truly dangerous
A chess computer doesn’t need reflection to win at chess. An AGI doesn’t need reflection to make its own causal models. So if the game is ‘eat the earth’, an unreflective AGI seems like a contender. One might argue that it needs to ‘understand’ reflection in order to understand the human beings that might oppose it, or to model its own nature, but I think the necessary capacities could emerge in an indirect way. In making a causal model of an external reflective intelligence it might need to worry about the halting problem, but computational resource bounds are a real-world issue that will anyway require it to have heuristics for noticing when a particular subtask is taking up too much time. As for self-modelling, it may be capable of forming partial self-models relevant for reasoning correctly about the implications of self-modification (or just the implications of damage to itself), just by applying standard causal modelling to its own physical vicinity, i.e. without any special data representations or computational architecture designed to tell it ‘this item represents me, myself, and not just another object in the world’.
It would be desirable to have a truly rigorous understanding of both these issues, but just thinking about them informally already tells me that there’s no safety here, we can’t say “whew, at least that isn’t possible”. Finally, a world-eating AGI equipped with a knowledge of physics and a head start in brute power might never have to worry about reflection, because human beings and their machines are just too easy to swat aside. You don’t need to become an entomologist before you can stomp an insect.
I agree with everything you’ve written as far as my modal hypothesis goes, but I also think we’re going to lose in that case, so I’ve sort of renormalized to focus my attention at least somewhat more on worlds where for some reason academic/industry AI approaches don’t work, even if that requires some sort of deus ex machina. My intuition says that highly recursive narrow AI style techniques should give you AGI, but to some extent this does go against e.g. the position of many philosophers of mind, and in this case I hope they’re right. Trying to imagine intermediate scenarios led me to think about this kinda stuff.
It would of course be incredibly foolish to entirely write off worlds where AGI is relatively easy, but I also think we should think about cases where for whatever reason that isn’t the case, and if it’s not the case then SingInst is in a uniquely good position to build uFAI.
I’ve sort of renormalized to focus my attention at least somewhat more on worlds where for some reason academic/industry AI approaches don’t work, even if that requires some sort of deus ex machina
I apologize for asking, but I just want to clarify something. When you write ‘deus ex machina’, you’re not solely using the term in a metaphorical sort of way, are you? Because, if you mean what it sort of sounds like you mean, at least some of your public positions suddenly make a lot more sense.
A chess computer doesn’t need reflection to win at chess. An AGI doesn’t need reflection to make its own causal models. So if the game is ‘eat the earth’, an unreflective AGI seems like a contender. One might argue that it needs to ‘understand’ reflection in order to understand the human beings that might oppose it, or to model its own nature, but I think the necessary capacities could emerge in an indirect way. In making a causal model of an external reflective intelligence it might need to worry about the halting problem, but computational resource bounds are a real-world issue that will anyway require it to have heuristics for noticing when a particular subtask is taking up too much time. As for self-modelling, it may be capable of forming partial self-models relevant for reasoning correctly about the implications of self-modification (or just the implications of damage to itself), just by applying standard causal modelling to its own physical vicinity, i.e. without any special data representations or computational architecture designed to tell it ‘this item represents me, myself, and not just another object in the world’.
It would be desirable to have a truly rigorous understanding of both these issues, but just thinking about them informally already tells me that there’s no safety here, we can’t say “whew, at least that isn’t possible”. Finally, a world-eating AGI equipped with a knowledge of physics and a head start in brute power might never have to worry about reflection, because human beings and their machines are just too easy to swat aside. You don’t need to become an entomologist before you can stomp an insect.
I agree with everything you’ve written as far as my modal hypothesis goes, but I also think we’re going to lose in that case, so I’ve sort of renormalized to focus my attention at least somewhat more on worlds where for some reason academic/industry AI approaches don’t work, even if that requires some sort of deus ex machina. My intuition says that highly recursive narrow AI style techniques should give you AGI, but to some extent this does go against e.g. the position of many philosophers of mind, and in this case I hope they’re right. Trying to imagine intermediate scenarios led me to think about this kinda stuff.
It would of course be incredibly foolish to entirely write off worlds where AGI is relatively easy, but I also think we should think about cases where for whatever reason that isn’t the case, and if it’s not the case then SingInst is in a uniquely good position to build uFAI.
I apologize for asking, but I just want to clarify something. When you write ‘deus ex machina’, you’re not solely using the term in a metaphorical sort of way, are you? Because, if you mean what it sort of sounds like you mean, at least some of your public positions suddenly make a lot more sense.
Yes, literal deus ex machina is one scenario which I find plausible.