So far as I know, every principle of this kind, except for Jessica Taylor’s “quantilization”, and “myopia” (not sure who correctly named this as a corrigibility principle), was invented by myself; eg “low impact”, “shutdownability”. (Though I don’t particularly think it hopeful if you claim that somebody else has publication priority on “low impact” or whatevs, in some stretched or even nonstretched way; ideas on the level of “low impact” have always seemed cheap to me to propose, harder to solve before the world ends.)
Low impact seems so easy to propose I doubt OP is the first.
I believe paulfchristiano has already raised this point, but at what level of ‘principles’ are being called for?
Myopia seems meant as a means to achieve shutdownability/modifiability.
Likewise for quanilization/TurnTrout’s work, on how to achieve low impact.
Low impact seems so easy to propose I doubt OP is the first.
I believe paulfchristiano has already raised this point, but at what level of ‘principles’ are being called for?
Myopia seems meant as a means to achieve shutdownability/modifiability.
Likewise for quanilization/TurnTrout’s work, on how to achieve low impact.