Tor Økland Barstad comments on Half-baked AI Safety ideas thread

Tor Økland Barstad 23 Jun 2022 19:24 UTC
1 point
Similar to this, but not the same: Experiment with AGI where it is set to align other AGI. For example, maybe it needs to do some tasks to do reward, but those tasks need to be done by the other AGI, and it don’t know what the tasks will be beforehand. One goal being to see methods AGI might use to align other AGI (that may then be used to align AGI-systems that are sub-systems of AGI-system, and seeing if output from this AGI converges with results from AGIs aligned by other principles).

Don’t expect that this would be that fruitful, but haven’t thought about it that much and who knows.

Would need to avoid suffering sub-routines.