I wish you included an entry for your definition of ‘mesa-optimizer’. When you use the term, do you mean the definition from the paper* (an algorithm that’s literally doing search using the mesa objective as the criterion), or you do speak more loosely (e.g., a mesa-optimizer is an optimizer in the same sense as a human is an optimizer)?
A related question is: how would you describe a policy that’s a bag of heuristics which, when executed, systematically leads to interesting (low-entopy) low-base-objective states?
*incidentally, looking back on the paper, it doesn’t look like we explicitly defined things this way, but it’s strongly implied that that’s the definition, and appears to be how the term is used on AF.
Glad you liked it! I definitely mean mesa-optimizer to refer to something mechanistically implementing search. That being said, I’m not really sure whether humans count or not on that definition—I would probably say humans do count but are fairly non-central. In terms of the bag of heuristics model, I probably wouldn’t count that, though it depends on what “bag of heuristics” means exactly—if the heuristics are being used to guide a planning process or something, then I would call that a mesa-optimizer.
Thanks for writing this.
I wish you included an entry for your definition of ‘mesa-optimizer’. When you use the term, do you mean the definition from the paper* (an algorithm that’s literally doing search using the mesa objective as the criterion), or you do speak more loosely (e.g., a mesa-optimizer is an optimizer in the same sense as a human is an optimizer)?
A related question is: how would you describe a policy that’s a bag of heuristics which, when executed, systematically leads to interesting (low-entopy) low-base-objective states?
*incidentally, looking back on the paper, it doesn’t look like we explicitly defined things this way, but it’s strongly implied that that’s the definition, and appears to be how the term is used on AF.
Glad you liked it! I definitely mean mesa-optimizer to refer to something mechanistically implementing search. That being said, I’m not really sure whether humans count or not on that definition—I would probably say humans do count but are fairly non-central. In terms of the bag of heuristics model, I probably wouldn’t count that, though it depends on what “bag of heuristics” means exactly—if the heuristics are being used to guide a planning process or something, then I would call that a mesa-optimizer.