One source of our disagreement: I would describe evolution as a type of local search. The difference is that it’s local with respect to the parameters of a whole population, rather than an individual agent. So this does introduce some disanalogies, but not particularly significant ones (to my mind). I don’t think it would make much difference to my heuristic if we imagined that humans had evolved via gradient descent over our genes instead.
In other words, I like the heuristic of backchaining to local search, and I think of it as a subset of my heuristic. The thing it’s missing, though, is that it doesn’t tell you which approaches will actually scale up to training regimes which are incredibly complicated, applied to fairly intelligent agents. For example, impact penalties make sense in a local search context for simple problems. But to evaluate whether they’ll work for AGIs, you need to apply them to massively complex environments. So my intuition is that, because I don’t know how to apply them to the human ancestral environment, we also won’t know how to apply them to our AGIs’ training environments.
Similarly, when I think about MIRI’s work on decision theory, I really have very little idea how to evaluate it in the context of modern machine learning. Are decision theories the type of thing which AIs can learn via local search? Seems hard to tell, since our AIs are so far from general intelligence. But I can reason much more easily about the types of decision theories that humans have, and the selective pressures that gave rise to them.
As a third example, my heuristic endorses Debate due to a high-level intuition about how human reasoning works, in addition to a low-level intuition about how it can arise via local search.
So if I try to summarize your position, it’s something like: backchain to local search for simple and single-AI cases, and then think about aligning humans for the scaled and multi-agents version? That makes much more sense, thanks!
I also definitely see why your full heuristic doesn’t feel immediately useful to me: because I mostly focus on the simple and single-AI case. But I’ve been thinking more and more (in part thanks to your writing) that I should allocate more thinking time to the more general case. I hope your heuristic will help me there.
Cool, glad to hear it. I’d clarify the summary slightly: I think all safety techniques should include at least a rough intuition for why they’ll work in the scaled-up version, even when current work on them only applies them to simple AIs. (Perhaps this was implicit in your summary already, I’m not sure.)
One source of our disagreement: I would describe evolution as a type of local search. The difference is that it’s local with respect to the parameters of a whole population, rather than an individual agent. So this does introduce some disanalogies, but not particularly significant ones (to my mind). I don’t think it would make much difference to my heuristic if we imagined that humans had evolved via gradient descent over our genes instead.
In other words, I like the heuristic of backchaining to local search, and I think of it as a subset of my heuristic. The thing it’s missing, though, is that it doesn’t tell you which approaches will actually scale up to training regimes which are incredibly complicated, applied to fairly intelligent agents. For example, impact penalties make sense in a local search context for simple problems. But to evaluate whether they’ll work for AGIs, you need to apply them to massively complex environments. So my intuition is that, because I don’t know how to apply them to the human ancestral environment, we also won’t know how to apply them to our AGIs’ training environments.
Similarly, when I think about MIRI’s work on decision theory, I really have very little idea how to evaluate it in the context of modern machine learning. Are decision theories the type of thing which AIs can learn via local search? Seems hard to tell, since our AIs are so far from general intelligence. But I can reason much more easily about the types of decision theories that humans have, and the selective pressures that gave rise to them.
As a third example, my heuristic endorses Debate due to a high-level intuition about how human reasoning works, in addition to a low-level intuition about how it can arise via local search.
So if I try to summarize your position, it’s something like: backchain to local search for simple and single-AI cases, and then think about aligning humans for the scaled and multi-agents version? That makes much more sense, thanks!
I also definitely see why your full heuristic doesn’t feel immediately useful to me: because I mostly focus on the simple and single-AI case. But I’ve been thinking more and more (in part thanks to your writing) that I should allocate more thinking time to the more general case. I hope your heuristic will help me there.
Cool, glad to hear it. I’d clarify the summary slightly: I think all safety techniques should include at least a rough intuition for why they’ll work in the scaled-up version, even when current work on them only applies them to simple AIs. (Perhaps this was implicit in your summary already, I’m not sure.)