The critical issue is whether consequentialist mesa optimizers will arise. If consequentialist mesaoptimizers don’t arise, like in the link below, then much of the safety concern is gone.
Any agentic AGI built via deep learning will almost by definition be a consequentialist mesaoptimizer (in the broad sense of consequentialism you are talking about, I think). It’ll be performing some sort of internal search to choose actions, while also SGD or whatever the outer training loop is performs ‘search’ to update its parameters. So, boom, base optimizer and mesa optimizer.
Quoting LawrenceC from that very thread:
<begin quote> Well, no, that’s not the definition of optimizer in the mesa-optimization post! Evan gives the following definition of an optimizer:
A system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system
And the following definition of a mesa-optimizer:
Mesa-optimization occurs when a base optimizer (in searching for algorithms to solve some problem) finds a model that is itself an optimizer, which we will call a mesa-optimizer.
<end quote>
The “mesa” part is pretty trivial. Humans are mesaoptimizers relative to the base optimizer of evolution. If an AGI is an optimizer at all, it’s a mesaoptimizer relative to the process that built it—human R&D industry if nothing else, though given deep learning it’ll probably be gradient descent or RL.
The critical issue is whether consequentialist mesa optimizers will arise. If consequentialist mesaoptimizers don’t arise, like in the link below, then much of the safety concern is gone.
Link below:
https://www.lesswrong.com/posts/firtXAWGdvzXYAh9B/paper-transformers-learn-in-context-by-gradient-descent#pbEciBKsk86xmcgqb
Any agentic AGI built via deep learning will almost by definition be a consequentialist mesaoptimizer (in the broad sense of consequentialism you are talking about, I think). It’ll be performing some sort of internal search to choose actions, while also SGD or whatever the outer training loop is performs ‘search’ to update its parameters. So, boom, base optimizer and mesa optimizer.
Quoting LawrenceC from that very thread:
<begin quote>
Well, no, that’s not the definition of optimizer in the mesa-optimization post! Evan gives the following definition of an optimizer:
And the following definition of a mesa-optimizer:
<end quote>
The “mesa” part is pretty trivial. Humans are mesaoptimizers relative to the base optimizer of evolution. If an AGI is an optimizer at all, it’s a mesaoptimizer relative to the process that built it—human R&D industry if nothing else, though given deep learning it’ll probably be gradient descent or RL.
I think the critical comment that I wanted to highlight was Nostaglebraist’s comment in that thread.