I’ll leave these two half-baked ideas here in case they’re somehow useful:
DO UNTIL - Construct an AI to perform its utility function until an undesirable failsafe condition is met. (Somehow) make the utility function not take the failsafe into account when calculating utility (can it be made blind to the failsafe somehow? Force the utility function to exclude their existence? Make lack of knowledge about failsafes part of the utility function?) Failsafes could be every undesirable outcome we can think of, such as human death rate exceeds X, biomass reduction, quantified human thoughts declines by X, mammalian species extictions, quantified human suffering exceeds X, or whatever. One problem is how to objectively attribute these triggers causally to the AI (what if another event occurs and shuts down the AI which we now rely on).
Energy limit - Limit the AIs activities (through its own utility function?) through an unambiguous quantifiable resource—matter moved around or energy expended. The energy expended would (somehow) include all activity under its control. Alternatively this could be a rate rather than a limit, but I think this would be more likely to go wrong. The idea would be to let the AGI go foom, but not let it have energy for other stuff like a paperclip universe. I am not sure about this idea achieving all that much safety, but here it is.
I don’t know if an intelligence explosion will truely be possible, but plenty of people smarter than I seem to think so… good luck in this field of work!
I’ll leave these two half-baked ideas here in case they’re somehow useful:
DO UNTIL - Construct an AI to perform its utility function until an undesirable failsafe condition is met. (Somehow) make the utility function not take the failsafe into account when calculating utility (can it be made blind to the failsafe somehow? Force the utility function to exclude their existence? Make lack of knowledge about failsafes part of the utility function?) Failsafes could be every undesirable outcome we can think of, such as human death rate exceeds X, biomass reduction, quantified human thoughts declines by X, mammalian species extictions, quantified human suffering exceeds X, or whatever. One problem is how to objectively attribute these triggers causally to the AI (what if another event occurs and shuts down the AI which we now rely on).
Energy limit - Limit the AIs activities (through its own utility function?) through an unambiguous quantifiable resource—matter moved around or energy expended. The energy expended would (somehow) include all activity under its control. Alternatively this could be a rate rather than a limit, but I think this would be more likely to go wrong. The idea would be to let the AGI go foom, but not let it have energy for other stuff like a paperclip universe. I am not sure about this idea achieving all that much safety, but here it is.
I don’t know if an intelligence explosion will truely be possible, but plenty of people smarter than I seem to think so… good luck in this field of work!