why would something being poorly defined make it harder to optimize
Well, you need to know what you’re optimizing! In a two-player game, if the second player gets to redefine the rules after the first player has moved, then they get a huge advantage. That’s essentially what happens by defining “the spirit of cricket” vaguely.
How is this different from games with a referee? A foul is what the referee says it is; the spirit of cricket is what the cricket-lovers say it is. In both cases a savvy optimizer would start modelling the relevant humans and predicting what they would and wouldn’t judge illegal.
I agree that different rules or optimization targets have different complexity levels, and the spirit of cricket seems more complicated than ordinary fouls which are more complicated than “did the ball hit the pegs.”
I think the two-player-game-but-player2-gets-to-modify-the-rules is not a fair analogy here. Like I said it’s the cricket-loving public that decides, not player 2.
Ah, sorry for unclarity. The game I’m referring to is the one between the player who’s trying to game the rules, and the referee/rule-judging body that’s trying to avoid being Goodharted. The judging body can either “move first” by specifying the rules precisely, or “move second” by judging whether or not an action broke the rules according to illegible criteria. The latter is straighforwardly much harder to Goodhart. Or they can do a combination: I think of referees as doing a combination of these things, because they’re meant to interpret fixed, well-defined rules, but there’s still some room for judgment calls.
I think the two-player-game-but-player2-gets-to-modify-the-rules is not a fair analogy here. Like I said it’s the cricket-loving public that decides, not player 2.
Broadly, I agree with Richard Ngo’s characterisation. You are right that the ‘cricket loving public’ plays some part in determining what counts as ‘within the spirit’ but it is the decision of the players themselves that often is most important.
How is this different from games with a referee? A foul is what the referee says it is; the spirit of cricket is what the cricket-lovers say it is. In both cases a savvy optimizer would start modelling the relevant humans and predicting what they would and wouldn’t judge illegal.
I agree that different rules or optimization targets have different complexity levels, and the spirit of cricket seems more complicated than ordinary fouls which are more complicated than “did the ball hit the pegs.”
I agree with you that the complexity is an important factor. I think you are correct that in principle this can still be Goodharted, but in practice it doesn’t seem to happen as it is much harder than Goodharting the written rules of the game, due to the increased complexity. There is nothing to prevent a superintelligent player from brainwashing the opposing team and general public to agreeing that their actions are legitimate. It’s just that doing this is a lot harder than normal ways of ‘gaming the system’. This is why I used the term ‘resists Goodharts law’ as opposed to ‘defeats Goodharts law’ or something similar.
It may be that there isn’t big enough money in cricket for it to be attractive to hypercompetitive athletes and coaches who are most likely to apply that optimization pressure?
Second player judging is in some ways an advantage, and in other ways a disadvantage (and either way not really what’s happening in cricket, because the players are actually cooperating rather than trying to exploit the rules but being foiled).
The disadvantage is that, lacking rules, it’s hard to communicate what you want the first player to do at all! You don’t get children to play soccer by not telling them any rules or giving any demonstrations, only judging their actions as legal or illegal.
If you do manage to communicate your preferences about what kind of game is even being played to player 1, then there’s no qualitative barrier to communicating enough information about your standards that you can get goodharted.
Well, you need to know what you’re optimizing! In a two-player game, if the second player gets to redefine the rules after the first player has moved, then they get a huge advantage. That’s essentially what happens by defining “the spirit of cricket” vaguely.
How is this different from games with a referee? A foul is what the referee says it is; the spirit of cricket is what the cricket-lovers say it is. In both cases a savvy optimizer would start modelling the relevant humans and predicting what they would and wouldn’t judge illegal.
I agree that different rules or optimization targets have different complexity levels, and the spirit of cricket seems more complicated than ordinary fouls which are more complicated than “did the ball hit the pegs.”
I think the two-player-game-but-player2-gets-to-modify-the-rules is not a fair analogy here. Like I said it’s the cricket-loving public that decides, not player 2.
Ah, sorry for unclarity. The game I’m referring to is the one between the player who’s trying to game the rules, and the referee/rule-judging body that’s trying to avoid being Goodharted. The judging body can either “move first” by specifying the rules precisely, or “move second” by judging whether or not an action broke the rules according to illegible criteria. The latter is straighforwardly much harder to Goodhart. Or they can do a combination: I think of referees as doing a combination of these things, because they’re meant to interpret fixed, well-defined rules, but there’s still some room for judgment calls.
Ahhh, I see, yes that makes sense.
Broadly, I agree with Richard Ngo’s characterisation. You are right that the ‘cricket loving public’ plays some part in determining what counts as ‘within the spirit’ but it is the decision of the players themselves that often is most important.
I agree with you that the complexity is an important factor. I think you are correct that in principle this can still be Goodharted, but in practice it doesn’t seem to happen as it is much harder than Goodharting the written rules of the game, due to the increased complexity. There is nothing to prevent a superintelligent player from brainwashing the opposing team and general public to agreeing that their actions are legitimate. It’s just that doing this is a lot harder than normal ways of ‘gaming the system’. This is why I used the term ‘resists Goodharts law’ as opposed to ‘defeats Goodharts law’ or something similar.
It may be that there isn’t big enough money in cricket for it to be attractive to hypercompetitive athletes and coaches who are most likely to apply that optimization pressure?
Second player judging is in some ways an advantage, and in other ways a disadvantage (and either way not really what’s happening in cricket, because the players are actually cooperating rather than trying to exploit the rules but being foiled).
The disadvantage is that, lacking rules, it’s hard to communicate what you want the first player to do at all! You don’t get children to play soccer by not telling them any rules or giving any demonstrations, only judging their actions as legal or illegal.
If you do manage to communicate your preferences about what kind of game is even being played to player 1, then there’s no qualitative barrier to communicating enough information about your standards that you can get goodharted.