I’m wondering whether this framing (choosing between a set of candidate worlds) is the most productive. Does it make sense to use criteria like corrigibility, minimizing impact and prefering reversible actions (or we have no reliable way to evaluate whether these hold)?
I’m wondering whether this framing (choosing between a set of candidate worlds) is the most productive. Does it make sense to use criteria like corrigibility, minimizing impact and prefering reversible actions (or we have no reliable way to evaluate whether these hold)?