MrThink comments on AGI safety from first principles: Alignment

MrThink 2 Oct 2020 22:24 UTC
1 point
Great thoughts, it was very interesting to read. Some thoughts occurred to me that might be of interest to others, and others input on them I would find interesting as well.
Imagine an AI was trained as an oracle, trained on a variety of questions and selected based on how “truthful” the answers were. Assuming this approach was possible and could create an AGI, might that be a viable way to “create selection pressures towards agents which think in the ways we want”? In other word, might this create an aligned AI regardless of extreme optima?
Another thought that occurred to me is: let’s say an AI that is “let loose” and spreads to new hardware, encounters the “real world” and is exposed to massive amounts of new data. then the range of circumstances would of course be considered very broad. In the example with the oracle, potentially everything could be the same during training and after the training, except for the questions asked. Could this potentially increase safety, since the range of circumstances it would need to have desirable motivations in would be comparatively narrow?

Lastly, I’m new to LessWrong, so I’m extra grateful for all input regarding how I can improve my reasoning and commenting skills.