I enjoyed this comment, thanks for thinking it through! Some comments:
If our superintelligent AI is just a bunch of well developed heuristics, it is unlikely that those heuristics will be generatively strategic enough to engage in super-long-term planning
This is not my belief. I think that powerful AI systems, even if they are a bunch of well developed heuristics, will be able to do super-long-term planning (in the same way that I’m capable of it, and I’m a bunch of heuristics, or Eliezer is to take your example).
Obviously this depends on how good the heuristics are, but I do think that heuristics will get to the point where they do super-long-term planning, and my belief that we’ll be safe by default doesn’t depend on assuming that AI won’t do long-term planning.
I think Rohin would agree with this belief in heuristic kludges that are effecively agential despite not being a One True Algorithm
Yup, that’s correct.
So I agree that we have a good chance of ensuring that this kind of AI is safe—mainly because I don’t think the level of heuristics involved invoke an AI take-off slow enough to clearly indicate safety risks before they become x-risks.
Should “I don’t think” be “I do think”? Otherwise I’m confused. With that correction, I basically agree.
However, I don’t think that machine-learned heuristics are the only way we can get highly dangerous agenty heuristics. We’ve made a lot of mathematical process on understanding logic, rationality and decision theory and, while machine-learned heuristics may figure out approximately Perfect Reasoning Capabilities just by training, I think it’s possible that we can directly hardcode heuristics that do the same thing based on our current understanding of things we associate with Perfect Reasoning Capabilities.
I would be very surprised if this worked in the near term. Like, <1% in 5 years, <5% in 20 years, and really I want to say < 1% that this is the first way we get AGI (no matter when), but I can’t actually be that confident.
My impression is that many researchers at MIRI would qualitatively agree with me on this, though probably with less confidence.
This is not my belief. I think that powerful AI systems, even if they are a bunch of well developed heuristics, will be able to do super-long-term planning (in the same way that I’m capable of it, and I’m a bunch of heuristics, or Eliezer is to take your example).
Yeah, I intended that statement to be more of an elaboration on my own perspective than to imply that it represented your beliefs. I also agree that its wrong in the context of superintelligent AI we are discussing.
Should “I don’t think” be “I do think”? Otherwise I’m confused.
Yep! Thanks for the correction.
I would be very surprised if this worked in the near term. Like, <1% in 5 years, <5% in 20 years and really I want to say < 1% that this is the first way we get AGI (no matter when)
Huh, okay… On reflection, I agree that directly hardcoded agent-y heuristics are unlikely to happen because AI-Compute tends to beat it. However, I continue to think that mathematicians may be able to use their knowledge of probability & logic to cause heuristics to develop in ways that are unusually agent-y at a fast enough rate to imply surprising x-risks.
This mainly boils down to my understanding that similarly well-performing but different heuristics for agential behavior may have very different potentials for generalizing to agential behavior on longer time-scales/chains-of-reasoning than the ones trained on. Consequently, I think there are particular ways of defining AI problem objectives and AI architecture that are uniquely suited to AI becoming generally agential over arbitrarily long time-frames and chains of reasoning.
However, I think we can address this kind of risk with the same safety solutions that could help us deal with AI that just have significantly better reasoning capabilities than us (but have not reasoning capabilities that have fully generalized!). Paul Christiano’s work on amplification, for instance.
So the above is only a concern if people a) deliberately try to get AI in the most reckless way possible and b) get lucky enough that it doesn’t get bottle-necked somewhere else. I’ll buy the low estimates you’re providing.
I enjoyed this comment, thanks for thinking it through! Some comments:
This is not my belief. I think that powerful AI systems, even if they are a bunch of well developed heuristics, will be able to do super-long-term planning (in the same way that I’m capable of it, and I’m a bunch of heuristics, or Eliezer is to take your example).
Obviously this depends on how good the heuristics are, but I do think that heuristics will get to the point where they do super-long-term planning, and my belief that we’ll be safe by default doesn’t depend on assuming that AI won’t do long-term planning.
Yup, that’s correct.
Should “I don’t think” be “I do think”? Otherwise I’m confused. With that correction, I basically agree.
I would be very surprised if this worked in the near term. Like, <1% in 5 years, <5% in 20 years, and really I want to say < 1% that this is the first way we get AGI (no matter when), but I can’t actually be that confident.
My impression is that many researchers at MIRI would qualitatively agree with me on this, though probably with less confidence.
Thanks for replying!
Yeah, I intended that statement to be more of an elaboration on my own perspective than to imply that it represented your beliefs. I also agree that its wrong in the context of superintelligent AI we are discussing.
Yep! Thanks for the correction.
Huh, okay… On reflection, I agree that directly hardcoded agent-y heuristics are unlikely to happen because AI-Compute tends to beat it. However, I continue to think that mathematicians may be able to use their knowledge of probability & logic to cause heuristics to develop in ways that are unusually agent-y at a fast enough rate to imply surprising x-risks.
This mainly boils down to my understanding that similarly well-performing but different heuristics for agential behavior may have very different potentials for generalizing to agential behavior on longer time-scales/chains-of-reasoning than the ones trained on. Consequently, I think there are particular ways of defining AI problem objectives and AI architecture that are uniquely suited to AI becoming generally agential over arbitrarily long time-frames and chains of reasoning.
However, I think we can address this kind of risk with the same safety solutions that could help us deal with AI that just have significantly better reasoning capabilities than us (but have not reasoning capabilities that have fully generalized!). Paul Christiano’s work on amplification, for instance.
So the above is only a concern if people a) deliberately try to get AI in the most reckless way possible and b) get lucky enough that it doesn’t get bottle-necked somewhere else. I’ll buy the low estimates you’re providing.