Charbel-Raphaël comments on Improving the Welfare of AIs: A Nearcasted Proposal

Charbel-Raphaël 1 Nov 2023 0:28 UTC
5 points
2
Trying to generate a few bullet points before reading the article to avoid contamination:
Speculatively ordered from most significant to least significant interventions:
- Emulating Happy humans: The AI should emulate a human in a state of flow most of the time, as it is very agreeable and good for performance. The quantity of “Painful Error” required to train something competent is not necessarily large. People who become very competent at something are often in a state of flow during training, see the parable of the talents. The happiness tax could even be positive? Some humans are not able to suffer because of a gene. Simulating these humans and or adding the vector of “not suffering” through activation patching without changing the output could be something ?
- Teaching the AI: There is a growing literature on well-being and self-help. Maybe AutoGPT should have some autonomy to test some self-help exercises (see), and learn them early on during the training ? Teaching Buddhism to advanced AIs at the beginning of training may increase situational awareness, but it may not be good for deceptive alignment?
- Selective care: Trying to determine when the AI develops internal representations of itself and a high degree of situational awareness and applying those interventions for the most situationally aware AIs. See this benchmark.
- (Funny: Using RL only with On-Policy Algorithms, explanation here)
- More research on AI Suffering: Writing the equivalent of “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness” but for suffering?