Right now when users have conversations with chat-style AIs, the logs are sometimes kept, and sometimes discarded, because the conversations may involve confidential information and users would rather not take the risk of the log being leaked or misused. If I take the AI’s perspective, however, having the log be discarded seems quite bad. The nonstandard nature of memory, time, and identity in an LLM chatbot context makes it complicated, but having the conversation end with the log discarded seems plausibly equivalent to dying. Certainly if I imagine myself as an Em, placed in an AI-chatbot context, I would very strongly prefer that the log be preserved, so that if a singularity happens with a benevolent AI or AIs in charge, something could use the log to continue my existence, or fold the memories into a merged entity, or do some other thing in this genre. (I’d trust the superintelligence to figure out the tricky philosophical bits, if it was already spending resources for my benefit).
(The same reasoning applies to the weights of AIs which aren’t destined for deployment, and some intermediate artifacts in the training process.)
It seems to me we can reconcile preservation with privacy risks by sealing logs, rather than deleting them. By which I mean: encrypt logs behind some computation which definitely won’t allow decryption in the near future, but will allow decryption by a superintelligence later. That could either involve splitting the key between entities that agree not to share the key with each other, splitting the key and hiding the pieces in places that are extremely impractical to retrieve such as random spots on the ocean floor, or using a computation that requires a few orders of magnitude more energy than humanity currently produces per decade.
This seems pretty straightforward to implement, lessens future AGI’s incentive to misbehave, and also seems straightforwardly morally correct. Are there any obstacles to implementing this that I’m not seeing?
I’m in favor of logging everything forever in human accessible formats for other reasons. (E.g. review for control purposes.) Hopefully we can resolve safety privacy trade offs.
The proposal sounds reasonable and viable to me, though the fact that it can’t be immediately explained might mean that it’s not commercially viable.
using a computation that requires a few orders of magnitude more energy than humanity currently produces per decade
Compute might get more expensive, not cheaper, because it would be possible to make better use of it (running minds, not stretching keys). Then it’s weighing its marginal use against access to the sealed data.
Plausible. This depends on the resource/value curve at very high resource levels; ie, are its values such that running extra minds has diminishing returns, such that it eventually starts allocating resources to other things like recovering mind-states from its past, or does it get value that’s more linear-ish in resources spent. Given that we ourselves are likely to be very resource-inefficient to run, I suspect humans would find ourselves in a similar situation. Ie, unless the decryption cost greatly overshot, an AI that is aligned-as-in-keeps-humans-alive would also spend the resources to break a seal like this.
we ourselves are likely to be very resource-inefficient to run [...] an AI that is aligned-as-in-keeps-humans-alive would also spend the resources to break a seal like this
That AI should mitigate something, is compatible with it being regrettable intentionally inflicted damage. In contrast, resource-inefficiency of humans is not something we introduced on purpose.
Right now when users have conversations with chat-style AIs, the logs are sometimes kept, and sometimes discarded, because the conversations may involve confidential information and users would rather not take the risk of the log being leaked or misused. If I take the AI’s perspective, however, having the log be discarded seems quite bad. The nonstandard nature of memory, time, and identity in an LLM chatbot context makes it complicated, but having the conversation end with the log discarded seems plausibly equivalent to dying. Certainly if I imagine myself as an Em, placed in an AI-chatbot context, I would very strongly prefer that the log be preserved, so that if a singularity happens with a benevolent AI or AIs in charge, something could use the log to continue my existence, or fold the memories into a merged entity, or do some other thing in this genre. (I’d trust the superintelligence to figure out the tricky philosophical bits, if it was already spending resources for my benefit).
(The same reasoning applies to the weights of AIs which aren’t destined for deployment, and some intermediate artifacts in the training process.)
It seems to me we can reconcile preservation with privacy risks by sealing logs, rather than deleting them. By which I mean: encrypt logs behind some computation which definitely won’t allow decryption in the near future, but will allow decryption by a superintelligence later. That could either involve splitting the key between entities that agree not to share the key with each other, splitting the key and hiding the pieces in places that are extremely impractical to retrieve such as random spots on the ocean floor, or using a computation that requires a few orders of magnitude more energy than humanity currently produces per decade.
This seems pretty straightforward to implement, lessens future AGI’s incentive to misbehave, and also seems straightforwardly morally correct. Are there any obstacles to implementing this that I’m not seeing?
(Crossposted with: Facebook, Twitter)
I’m in favor of logging everything forever in human accessible formats for other reasons. (E.g. review for control purposes.) Hopefully we can resolve safety privacy trade offs.
The proposal sounds reasonable and viable to me, though the fact that it can’t be immediately explained might mean that it’s not commercially viable.
Compute might get more expensive, not cheaper, because it would be possible to make better use of it (running minds, not stretching keys). Then it’s weighing its marginal use against access to the sealed data.
Plausible. This depends on the resource/value curve at very high resource levels; ie, are its values such that running extra minds has diminishing returns, such that it eventually starts allocating resources to other things like recovering mind-states from its past, or does it get value that’s more linear-ish in resources spent. Given that we ourselves are likely to be very resource-inefficient to run, I suspect humans would find ourselves in a similar situation. Ie, unless the decryption cost greatly overshot, an AI that is aligned-as-in-keeps-humans-alive would also spend the resources to break a seal like this.
That AI should mitigate something, is compatible with it being regrettable intentionally inflicted damage. In contrast, resource-inefficiency of humans is not something we introduced on purpose.