Daniel Kokotajlo comments on Anomalous tokens reveal the original identities of Instruct models

Daniel Kokotajlo 11 Feb 2023 15:18 UTC
2 points
0
Any agentic AGI built via deep learning will almost by definition be a consequentialist mesaoptimizer (in the broad sense of consequentialism you are talking about, I think). It’ll be performing some sort of internal search to choose actions, while also SGD or whatever the outer training loop is performs ‘search’ to update its parameters. So, boom, base optimizer and mesa optimizer.

Quoting LawrenceC from that very thread:

<begin quote>
Well, no, that’s not the definition of optimizer in the mesa-optimization post! Evan gives the following definition of an optimizer:
A system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system
And the following definition of a mesa-optimizer:
Mesa-optimization occurs when a base optimizer (in searching for algorithms to solve some problem) finds a model that is itself an optimizer, which we will call a mesa-optimizer.
<end quote>

The “mesa” part is pretty trivial. Humans are mesaoptimizers relative to the base optimizer of evolution. If an AGI is an optimizer at all, it’s a mesaoptimizer relative to the process that built it—human R&D industry if nothing else, though given deep learning it’ll probably be gradient descent or RL.
- Noosphere89 11 Feb 2023 15:36 UTC
  1 point
  1
  Parent
  I think the critical comment that I wanted to highlight was Nostaglebraist’s comment in that thread.