So if I try to summarize your position, it’s something like: backchain to local search for simple and single-AI cases, and then think about aligning humans for the scaled and multi-agents version? That makes much more sense, thanks!
I also definitely see why your full heuristic doesn’t feel immediately useful to me: because I mostly focus on the simple and single-AI case. But I’ve been thinking more and more (in part thanks to your writing) that I should allocate more thinking time to the more general case. I hope your heuristic will help me there.
Cool, glad to hear it. I’d clarify the summary slightly: I think all safety techniques should include at least a rough intuition for why they’ll work in the scaled-up version, even when current work on them only applies them to simple AIs. (Perhaps this was implicit in your summary already, I’m not sure.)
So if I try to summarize your position, it’s something like: backchain to local search for simple and single-AI cases, and then think about aligning humans for the scaled and multi-agents version? That makes much more sense, thanks!
I also definitely see why your full heuristic doesn’t feel immediately useful to me: because I mostly focus on the simple and single-AI case. But I’ve been thinking more and more (in part thanks to your writing) that I should allocate more thinking time to the more general case. I hope your heuristic will help me there.
Cool, glad to hear it. I’d clarify the summary slightly: I think all safety techniques should include at least a rough intuition for why they’ll work in the scaled-up version, even when current work on them only applies them to simple AIs. (Perhaps this was implicit in your summary already, I’m not sure.)