I think that given good value learning, safety isn’t that difficult. I think even a fairly halfharted attempt at the sort of Naive safety measures discussed will probably lead to non catastrophic outcomes.
Tell it about mindcrime from the start. Give it lots of hard disks, and tell it to store anything that might possibly resemble a human mind. It only needs to work well enough with a bunch of Miri people guiding it and answering its questions. Post singularity, a superintelligence can see if there are any human minds in the simulations it created when young and dumb. If there are, welcome those minds to the utopia.
As I said in another comment: To learn human values from, say, fixed texts is a good start, but it doesn’t solve the “chicken or the egg problem”: that we start from running non-aligned AI which is learning human values, but we want the first AI to be already aligned. One possible obstacle: non-aligned AI could run away before it has finished to learn human values from the texts.
I think that given good value learning, safety isn’t that difficult. I think even a fairly halfharted attempt at the sort of Naive safety measures discussed will probably lead to non catastrophic outcomes.
Tell it about mindcrime from the start. Give it lots of hard disks, and tell it to store anything that might possibly resemble a human mind. It only needs to work well enough with a bunch of Miri people guiding it and answering its questions. Post singularity, a superintelligence can see if there are any human minds in the simulations it created when young and dumb. If there are, welcome those minds to the utopia.
As I said in another comment: To learn human values from, say, fixed texts is a good start, but it doesn’t solve the “chicken or the egg problem”: that we start from running non-aligned AI which is learning human values, but we want the first AI to be already aligned. One possible obstacle: non-aligned AI could run away before it has finished to learn human values from the texts.