With my current understanding of compute hardware and of the software of various current AI systems, I don’t see a path towards a ‘locked in conscience’ that a bad actor with full control over the hardware/software couldn’t remove. Even chips soldered to a board can be removed/replaced/hacked.
My best guess is that the only approaches to having an ‘AI conscience’ be robust to bad actors is to make both the software and hardware inaccessible to the bad actors. In other words, that it won’t be feasible to do for open-weights models, only closed-weight models accessed through controlled APIs. APIs still allow for fine-tuning! I don’t think we lose utility by having all private uses go through APIs, so long as there isn’t undue censorship on the API.
I think figuring out ways to have an API which does restrict things like information pertaining to the creation of weapons of mass destruction, but not pertaining to personal lifestyle choices (e.g. pornography) would be a very important step towards reducing the public pressure for open-weights models.
Thanks for the comment. You might be right that any hardware/software can ultimately be tampered with, especially if an ASI is driving/helping with the jail breaking process. It seems likely that silicon-based GPU’s will be the hardware to get us to the first AGI’s, but this isn’t an absolute certainty since people are working on other routes such as thermodynamic computing. That makes things harder to predict, but it doesn’t invalidate your take on things, I think. My not-very-well-researched-initial-thought was something like this (chips that self destruct when tampered with).
I envision people having AGI-controlled robots at some point, which may complicate things in terms of having the software/hardware inaccessible to people, unless the robot couldn’t operate without an internet connection, i.e., part of its hardware/software was in the cloud. It’s likely the hardware in the robot itself could still be tampered with in this situation, though, so it still seems like we’d want some kind of self-destructing chip to avoid tampering, even if this ultimately only buys us time until AGI+’s/ASI’s figure a way around this.
With my current understanding of compute hardware and of the software of various current AI systems, I don’t see a path towards a ‘locked in conscience’ that a bad actor with full control over the hardware/software couldn’t remove. Even chips soldered to a board can be removed/replaced/hacked.
My best guess is that the only approaches to having an ‘AI conscience’ be robust to bad actors is to make both the software and hardware inaccessible to the bad actors. In other words, that it won’t be feasible to do for open-weights models, only closed-weight models accessed through controlled APIs. APIs still allow for fine-tuning! I don’t think we lose utility by having all private uses go through APIs, so long as there isn’t undue censorship on the API.
I think figuring out ways to have an API which does restrict things like information pertaining to the creation of weapons of mass destruction, but not pertaining to personal lifestyle choices (e.g. pornography) would be a very important step towards reducing the public pressure for open-weights models.
Thanks for the comment. You might be right that any hardware/software can ultimately be tampered with, especially if an ASI is driving/helping with the jail breaking process. It seems likely that silicon-based GPU’s will be the hardware to get us to the first AGI’s, but this isn’t an absolute certainty since people are working on other routes such as thermodynamic computing. That makes things harder to predict, but it doesn’t invalidate your take on things, I think. My not-very-well-researched-initial-thought was something like this (chips that self destruct when tampered with).
I envision people having AGI-controlled robots at some point, which may complicate things in terms of having the software/hardware inaccessible to people, unless the robot couldn’t operate without an internet connection, i.e., part of its hardware/software was in the cloud. It’s likely the hardware in the robot itself could still be tampered with in this situation, though, so it still seems like we’d want some kind of self-destructing chip to avoid tampering, even if this ultimately only buys us time until AGI+’s/ASI’s figure a way around this.