I will warn everyone from the start that I am not an researcher, and that I am utterly intimidated by the amount of reading material that might make my speculations here redundant. See this as me venting my brain of an idea I hope has some merit.
First: Aligning an Ai with a single goal seems to easily leads to a paperclip optimiser.
->It would be logical to make Ai have multiple goals/values and a framework of how these goals/values interact with each other, so one can establish a system of counterbalancing.
Second: Even if we decide that our final values should be static, the development/programming/decision process of these values will be dynamic and such allow value drift to occur.
-> Values need to be evaluated for distinctiveness, otherwise they will not counterbalance each other. What do we mean by distictiveness? That for each pair of values we can find a scenario were we can optimise one value independant, without correlation to the other.
Three: If Values can not be raised together they will start to counterbalance each other.
->We can not group an arbitrary number of distinct values, we need to map which arbitrary values are cooperative. Cooperative meaning having a large spectrum of scenarios were they can be optimised together to a high level.
Now the positive is: We can use seed values like, conforming to reality, survival of the human race and life, happiness etc and see which groupings can naturally occur.
The Negative: we need to define the framework in which values interact to evaluate group fitness.
For example: How do we treat low fitness of single values? Do we encourage spikes of high fitness or seek the best average? What about future fitness and reliabilty of information? All these are topics of their own.
In conclusion: Whats your opinion on such a moral map? Is it important or just busywork? Do you have an idea were and how to run prelimiary tests on such mapping?
Speculation on mapping the moral landscape for future Ai Alignment
I will warn everyone from the start that I am not an researcher, and that I am utterly intimidated by the amount of reading material that might make my speculations here redundant. See this as me venting my brain of an idea I hope has some merit.
First: Aligning an Ai with a single goal seems to easily leads to a paperclip optimiser.
->It would be logical to make Ai have multiple goals/values and a framework of how these goals/values interact with each other, so one can establish a system of counterbalancing.
Second: Even if we decide that our final values should be static, the development/programming/decision process of these values will be dynamic and such allow value drift to occur.
-> Values need to be evaluated for distinctiveness, otherwise they will not counterbalance each other. What do we mean by distictiveness? That for each pair of values we can find a scenario were we can optimise one value independant, without correlation to the other.
Three: If Values can not be raised together they will start to counterbalance each other.
->We can not group an arbitrary number of distinct values, we need to map which arbitrary values are cooperative. Cooperative meaning having a large spectrum of scenarios were they can be optimised together to a high level.
Now the positive is: We can use seed values like, conforming to reality, survival of the human race and life, happiness etc and see which groupings can naturally occur.
The Negative: we need to define the framework in which values interact to evaluate group fitness.
For example: How do we treat low fitness of single values? Do we encourage spikes of high fitness or seek the best average? What about future fitness and reliabilty of information? All these are topics of their own.
In conclusion: Whats your opinion on such a moral map? Is it important or just busywork? Do you have an idea were and how to run prelimiary tests on such mapping?