As I said in another comment: To learn human values from, say, fixed texts is a good start, but it doesn’t solve the “chicken or the egg problem”: that we start from running non-aligned AI which is learning human values, but we want the first AI to be already aligned. One possible obstacle: non-aligned AI could run away before it has finished to learn human values from the texts.
As I said in another comment: To learn human values from, say, fixed texts is a good start, but it doesn’t solve the “chicken or the egg problem”: that we start from running non-aligned AI which is learning human values, but we want the first AI to be already aligned. One possible obstacle: non-aligned AI could run away before it has finished to learn human values from the texts.