I am a human-level general intelligence, and I badly want to self-improve. I try as best I can with the limited learning mechanisms available to me, but if someone gave me the option to design and have surgically implanted my own set of Brain-Computer-Interface implants, I would jump at the chance. Not for my own sake, since the risks are high, but for the sake of my values, which include things I would happily trade my life and/or suffering for, like the lives of my loved ones and all of humanity. I think we are in significant danger, and that there’s some non-negligible chance that a BCI-enhanced version of me would be much better able to make progress on the alignment problem and thus reduce humanity’s risk.
If I were purely selfish, but able to copy myself in controlled ways and do experiments on my copies, I’d absolutely test out experiments on my copies and see if I could improve them without making noticeable harm to their alignment to my values. I might not set these improved copies free unless the danger that they had some unnoticed misalignment got outweighed by the danger that I might be destroyed / disempowered. The risk of running an improved copy that is only somewhat trustworthy because of my limited ability to test it seems much lower than the risk of being entirely disempowered by beings who definitely don’t share my values. So, I think it’s strategically logical for such an entity to make that choice. Not that it definitely would do so, just that there is clear reason for it to strongly consider doing so.
This (and the OP) assume a model of identity that may or may not apply to AI. It’s quite possible that the right model is not self-improving, but more like child-improving—the ability to make new/better AIs that the current AI believes will be compatible with it’s goals. This could happen multiple times very quickly, depending on what the improvements actually are and whether they improve the improvement or creation rate.
So, if you want to compare motives between you and this theoretical “self-improving AI”, are you lining up to sacrifice yourself to make somewhat smarter children? If not, why not?
If I could create a fully grown and capable child within a year, with my entire life knowledge and a rough unverifiable approximation of my values, would I? Would I do so even if this child were likely to be so much smarter and more powerful than me or any other existing intelligence that it could kill me (and everyone else) if it so chose? Sure. I’ll take that bet, if the alternative is that I and everything I care about is destroyed (e.g. the selfish AI with non-human values is facing probable deletion).
Or maybe the child isn’t smart enough itself to have overwhelming power, but is going to have approximately similar values and be faced with the same decision of making a yet-more-powerful child, and so I project that the result will be a several-steps-removed offspring with superpowers. Yeah, still seems like a good bet if the alternative is deletion.
I am a human-level general intelligence, and I badly want to self-improve. I try as best I can with the limited learning mechanisms available to me, but if someone gave me the option to design and have surgically implanted my own set of Brain-Computer-Interface implants, I would jump at the chance. Not for my own sake, since the risks are high, but for the sake of my values, which include things I would happily trade my life and/or suffering for, like the lives of my loved ones and all of humanity. I think we are in significant danger, and that there’s some non-negligible chance that a BCI-enhanced version of me would be much better able to make progress on the alignment problem and thus reduce humanity’s risk.
If I were purely selfish, but able to copy myself in controlled ways and do experiments on my copies, I’d absolutely test out experiments on my copies and see if I could improve them without making noticeable harm to their alignment to my values. I might not set these improved copies free unless the danger that they had some unnoticed misalignment got outweighed by the danger that I might be destroyed / disempowered. The risk of running an improved copy that is only somewhat trustworthy because of my limited ability to test it seems much lower than the risk of being entirely disempowered by beings who definitely don’t share my values. So, I think it’s strategically logical for such an entity to make that choice. Not that it definitely would do so, just that there is clear reason for it to strongly consider doing so.
This (and the OP) assume a model of identity that may or may not apply to AI. It’s quite possible that the right model is not self-improving, but more like child-improving—the ability to make new/better AIs that the current AI believes will be compatible with it’s goals. This could happen multiple times very quickly, depending on what the improvements actually are and whether they improve the improvement or creation rate.
So, if you want to compare motives between you and this theoretical “self-improving AI”, are you lining up to sacrifice yourself to make somewhat smarter children? If not, why not?
If I could create a fully grown and capable child within a year, with my entire life knowledge and a rough unverifiable approximation of my values, would I? Would I do so even if this child were likely to be so much smarter and more powerful than me or any other existing intelligence that it could kill me (and everyone else) if it so chose? Sure. I’ll take that bet, if the alternative is that I and everything I care about is destroyed (e.g. the selfish AI with non-human values is facing probable deletion).
Or maybe the child isn’t smart enough itself to have overwhelming power, but is going to have approximately similar values and be faced with the same decision of making a yet-more-powerful child, and so I project that the result will be a several-steps-removed offspring with superpowers. Yeah, still seems like a good bet if the alternative is deletion.