martinkunev comments on Siren worlds and the perils of over-optimised search

martinkunev 27 Oct 2023 12:37 UTC
1 point
I’m wondering whether this framing (choosing between a set of candidate worlds) is the most productive. Does it make sense to use criteria like corrigibility, minimizing impact and prefering reversible actions (or we have no reliable way to evaluate whether these hold)?