What’s the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we’ve got for alignment and to be pretty optimistic about it, but I haven’t heard anyone else talking about it. Either I’m completely misunderstanding what he’s talking about, or he’s somehow found a way around all of the alignment problems.
Video of him explaining it here for reference, and thanks in advance:
Watched the video. He’s got a lot of the key ideas and vocabulary. Orthogonality, convergent instrumental goals, the treacherous turn, etc. The fact that these language models have some understanding of ethics and nuance might be a small ray of hope. But understanding is not the same as caring (orthogonality).
However, he does seem to be lacking in the security mindset, imagining only how things can go right, and seems to assume that we’ll have a soft takeoff with a lot of competing AIs, i.e. ignoring the FOOM problem caused by an overhang which makes a singleton scenario far more likely, in my opinion.
But even if we grant him a soft takeoff, I still think he’s too optimistic. Even that may not go well. Even if we get a multipolar scenario, with some of the AIs on our side, humanity likely becomes collateral damage in the ensuing AI wars. Those AIs willing to burn everything else in pursuit of simple goals would have an edge over those with more to protect.
I watched the video, and appreciate that he seems to know the literature quite well and has thought about this a fair bit—he did a really good introduction of some of the known problems. This particular video doesn’t go into much detail on his proposal, and I’d have to read his papers to delve further—this seems worthwhile so I’ll add some to my reading list.
I can still point out the biggest ways in which I see him being overconfident :
Only considering the multi-agent world. Though he’s right that there already are and will be many many deployed AI systems, that doe not translate to there being many deployed state of the art systems. As long as training costs and inference costs continue increasing (as they have), then on the contrary fewer and fewer actors will be able to afford state of the art system training and deployment, leading to very few (or one) significantly powerful AGI (as compared to the others, for example GPT4 vs GPT2)
Not considering the impact that governance and policies could have on this. This isn’t just a tech thing where tech people can do whatever they want forever, regulation is coming. If we think we have higher chances of survival in highly regulated worlds, then the ai safety community will do a bunch of work to ensure fast and effective regulation (to the extent possible). The genie is not out of the bag for powerful AGI, governments can control compute and regulate powerful AI as weapons, and setup international agreements to ensure this.
The hope that game theory ensures that AI developed under his principles would be good for humans. There’s a crucial gap between going from real world to math models. Game theory might predict good results under certain conditions and rules and assumptions, but many of these aren’t true of the real world and simple game theory does not yield accurate world predictions (eg. make people play various social games and they won’t act like how game theory says). Stated strongly, putting your hope on game theory is about as hard on putting your hope on alignment. There’s nothing magical about game theory which makes it work simpler than alignment, and it’s been studied extensively by ai researchers (eg. why Eliezer calls himself a decision theorist and writes a lot about economics) with no clear “we’ve found a theory which empirically works robustly and in which we can put the fate of humanity in”
I work in AI strategy and governance, and feel we have better chances of survival in a world where powerful AI is limited to extremely few actors, with international supervision and cooperation for the guidance and use of these systems, making extreme efforts in engineering safety, in corrigibility, etc. I am not trustworthy of predictions on how complex systems turn out (which is the case of real multi agent problems) and don’t think we can control these well in most relevant cases.
What’s the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we’ve got for alignment and to be pretty optimistic about it, but I haven’t heard anyone else talking about it. Either I’m completely misunderstanding what he’s talking about, or he’s somehow found a way around all of the alignment problems.
Video of him explaining it here for reference, and thanks in advance:
Watched the video. He’s got a lot of the key ideas and vocabulary. Orthogonality, convergent instrumental goals, the treacherous turn, etc. The fact that these language models have some understanding of ethics and nuance might be a small ray of hope. But understanding is not the same as caring (orthogonality).
However, he does seem to be lacking in the security mindset, imagining only how things can go right, and seems to assume that we’ll have a soft takeoff with a lot of competing AIs, i.e. ignoring the FOOM problem caused by an overhang which makes a singleton scenario far more likely, in my opinion.
But even if we grant him a soft takeoff, I still think he’s too optimistic. Even that may not go well. Even if we get a multipolar scenario, with some of the AIs on our side, humanity likely becomes collateral damage in the ensuing AI wars. Those AIs willing to burn everything else in pursuit of simple goals would have an edge over those with more to protect.
I watched the video, and appreciate that he seems to know the literature quite well and has thought about this a fair bit—he did a really good introduction of some of the known problems.
This particular video doesn’t go into much detail on his proposal, and I’d have to read his papers to delve further—this seems worthwhile so I’ll add some to my reading list.
I can still point out the biggest ways in which I see him being overconfident :
Only considering the multi-agent world. Though he’s right that there already are and will be many many deployed AI systems, that doe not translate to there being many deployed state of the art systems. As long as training costs and inference costs continue increasing (as they have), then on the contrary fewer and fewer actors will be able to afford state of the art system training and deployment, leading to very few (or one) significantly powerful AGI (as compared to the others, for example GPT4 vs GPT2)
Not considering the impact that governance and policies could have on this. This isn’t just a tech thing where tech people can do whatever they want forever, regulation is coming. If we think we have higher chances of survival in highly regulated worlds, then the ai safety community will do a bunch of work to ensure fast and effective regulation (to the extent possible). The genie is not out of the bag for powerful AGI, governments can control compute and regulate powerful AI as weapons, and setup international agreements to ensure this.
The hope that game theory ensures that AI developed under his principles would be good for humans. There’s a crucial gap between going from real world to math models. Game theory might predict good results under certain conditions and rules and assumptions, but many of these aren’t true of the real world and simple game theory does not yield accurate world predictions (eg. make people play various social games and they won’t act like how game theory says). Stated strongly, putting your hope on game theory is about as hard on putting your hope on alignment. There’s nothing magical about game theory which makes it work simpler than alignment, and it’s been studied extensively by ai researchers (eg. why Eliezer calls himself a decision theorist and writes a lot about economics) with no clear “we’ve found a theory which empirically works robustly and in which we can put the fate of humanity in”
I work in AI strategy and governance, and feel we have better chances of survival in a world where powerful AI is limited to extremely few actors, with international supervision and cooperation for the guidance and use of these systems, making extreme efforts in engineering safety, in corrigibility, etc. I am not trustworthy of predictions on how complex systems turn out (which is the case of real multi agent problems) and don’t think we can control these well in most relevant cases.