Suppose an outcome pump picks a random property, checks if papers with it Goodhart your points, and time-loops until it finds one. Do you think it would eventually find one? Unfortunately, optimization tries all properties in parallel, without even an outcome pump.
Treat hardness proofs (perpetual motion, NP, …) as neon tubes on the box to think outside of. Find any difference between the proven-hard problem and yours (usually exists!), then imagine leads that wouldn’t help on the proven-hard problem, leads you don’t get better at ruling out by knowing the existing proof.
To not fall to the dire kind of “adversary” that moves after you, don’t calculate a number.
Suppose an outcome pump picks a random property, checks if papers with it Goodhart your points, and time-loops until it finds one. Do you think it would eventually find one? Unfortunately, optimization tries all properties in parallel, without even an outcome pump.
Treat hardness proofs (perpetual motion, NP, …) as neon tubes on the box to think outside of. Find any difference between the proven-hard problem and yours (usually exists!), then imagine leads that wouldn’t help on the proven-hard problem, leads you don’t get better at ruling out by knowing the existing proof.
To not fall to the dire kind of “adversary” that moves after you, don’t calculate a number.
Sorry, I think I have an idea of what you’re saying, but I’m not really sure. Do you mind elaborating? With a little less LessWrong lingo, please.