Good question. There’s a greatamount of confusion over the exact definition, but in the context of this post specifically:
An optimizer is a very advanced meta-learning algorithm that can learn the rules of (effectively) any environment and perform well in it. It’s general by definition. It’s efficient because this generality allows it to use maximally efficient internal representations of its environment.
For example, consider a (generalist) ML model that’s fed the full description of the Solar System at the level of individual atoms, and which is asked to roughly predict the movement of Earth over the next year. It can keep modeling things at the level of atoms; or, it can dump the overwhelming majority of that information, collapse sufficiently large objects into point masses, and use Cowell’s method.
The second option is greatly more efficient, while decreasing the accuracy only marginally. However, to do that, the model needs to know how to translate between different internal representations[1], and how to model and achieve goals in arbitrary systems[2].
The same property that allows an optimizer to perform well in any environment allows it to efficiently model any environment. And vice versa, which is the bad part. The ability to efficiently model any environment allows an agent to perform well in any environment, so math!superintelligence would translate to real-world!superintelligence, and a math!goal would be mirrored by some real!goal. Next thing we know, everything is paperclips.
Assuming it hasn’t been trained for this task specifically, of course, in which case it can just learn how to translate its inputs into this specific high-level representation, how to work with this specific high-level representation, and nothing else. But we’re assuming a generalist model here: there was nothing like this in its training dataset.
An optimizer is a very advanced meta-learning algorithm that can learn the rules of (effectively) any environment and perform well in it. It’s general by definition.
A square circle is square and circular by definition, but I still don’t believe in them. There has to be a trade off between generality and efficiency.
t can keep modeling things at the level of atoms; or, it can dump the overwhelming majority of that information, collapse sufficiently large objects into point masses, and use Cowell’s method.
Once it has dumped the overwhelming majority of the information, it is no longer general. It’s not (fully) general and (fully) efficient.
Good question. There’s a great amount of confusion over the exact definition, but in the context of this post specifically:
An optimizer is a very advanced meta-learning algorithm that can learn the rules of (effectively) any environment and perform well in it. It’s general by definition. It’s efficient because this generality allows it to use maximally efficient internal representations of its environment.
For example, consider a (generalist) ML model that’s fed the full description of the Solar System at the level of individual atoms, and which is asked to roughly predict the movement of Earth over the next year. It can keep modeling things at the level of atoms; or, it can dump the overwhelming majority of that information, collapse sufficiently large objects into point masses, and use Cowell’s method.
The second option is greatly more efficient, while decreasing the accuracy only marginally. However, to do that, the model needs to know how to translate between different internal representations[1], and how to model and achieve goals in arbitrary systems[2].
The same property that allows an optimizer to perform well in any environment allows it to efficiently model any environment. And vice versa, which is the bad part. The ability to efficiently model any environment allows an agent to perform well in any environment, so math!superintelligence would translate to real-world!superintelligence, and a math!goal would be mirrored by some real!goal. Next thing we know, everything is paperclips.
E. g., how to keep track of what “Earth” is, as it moves from being a bunch of atoms to a point mass.
Assuming it hasn’t been trained for this task specifically, of course, in which case it can just learn how to translate its inputs into this specific high-level representation, how to work with this specific high-level representation, and nothing else. But we’re assuming a generalist model here: there was nothing like this in its training dataset.
A square circle is square and circular by definition, but I still don’t believe in them. There has to be a trade off between generality and efficiency.
Once it has dumped the overwhelming majority of the information, it is no longer general. It’s not (fully) general and (fully) efficient.