With current approaches you need to kind of force those properties onto AI. But they will never be fundamental for AI’s thinking and learning.
I think “money system” approach is interesting because it can make all those properties fundamental. Because a “money system” needs all those properties to exist (it needs to be somewhat real, avoid being hacked, allow corrections if a loophole is discovered, avoid being completely controlled by a single agent).
I’m not saying it solves everything. But it’s a way to deeply internalize some important safety properties.
Kant’s applications of categorical imperative, Kant’s arguments are similar to reasoning about “money systems”. For example:
Does stealing make sense as a “money system”? No. If everyone is stealing something, then personal property doesn’t exist and there’s nothing to steal.
Note: I’m not talking about Kant’s conclusions, I’m talking about Kant’s style of reasoning.
Corrigibility, reward hacking, Goodhart
How do we make an AI corrigible? How do we avoid reward hacking? Make an AI care about real things, not measures of real things? (Goodhart’s Law)
With current approaches you need to kind of force those properties onto AI. But they will never be fundamental for AI’s thinking and learning.
I think “money system” approach is interesting because it can make all those properties fundamental. Because a “money system” needs all those properties to exist (it needs to be somewhat real, avoid being hacked, allow corrections if a loophole is discovered, avoid being completely controlled by a single agent).
I’m not saying it solves everything. But it’s a way to deeply internalize some important safety properties.
Kant, Categorical Imperative
Categorical imperative#Application
Kant’s applications of categorical imperative, Kant’s arguments are similar to reasoning about “money systems”. For example:
Does stealing make sense as a “money system”? No. If everyone is stealing something, then personal property doesn’t exist and there’s nothing to steal.
Note: I’m not talking about Kant’s conclusions, I’m talking about Kant’s style of reasoning.