I haven’t read your papers but your proposal seems like it would scale up until the point when the AGI looks at itself. If it can’t learn at this point then I find it hard to believe it’s generally capable, and if it can, it will have incentive to simply remove the device or create a copy of itself that is correct about its own world model. Do you address this in the articles?
On the other hand, this made me curious about what we could do with an advanced model that is instructed to not learn and also whether we can even define and ensure a model stops learning.
I haven’t read your papers but your proposal seems like it would scale up until the point when the AGI looks at itself. [...] Do you address this in the articles?
Yes I address this, see for example the part about The possibility of learned self-knowledge in the sequence.
I show there that any RL agent, even a non-AGI, will always have
the latent ability to ‘look at itself’ and create a machine-learned model of its compute core internals.
What is done with this latent ability is up to the designer.
The key thing here is that you have a choice as a designer, you can decide if you want to design an agent which indeed uses this latent ability to ‘look at itself’.
Once you decide that you don’t want to use this latent ability, certain safety/corrigibility problems
become a lot more tractable.
Artificial general intelligence (AGI) is the hypothetical ability of an intelligent agent to understand or learn any intellectual task that a human being can.
Though there is plenty of discussion on this forum which silently assumes
otherwise, there is no law of nature which says that, when I build a useful AGI-level AI, I must necessarily create the entire package of all human cognitive abilities inside of it.
this made me curious about what we could do with an advanced model that is instructed to not learn and also whether we can even define and ensure a model stops learning.
Terminology note if you want to look into this some more:
ML typically does not frame this goal as ‘instructing the model not to
learn about Q’. ML would frame this as ‘building the model to
approximate the specific relation P(X|Y,Z) between some well-defined
observables, and this relation is definitely not Q’.
I haven’t read your papers but your proposal seems like it would scale up until the point when the AGI looks at itself. If it can’t learn at this point then I find it hard to believe it’s generally capable, and if it can, it will have incentive to simply remove the device or create a copy of itself that is correct about its own world model. Do you address this in the articles?
On the other hand, this made me curious about what we could do with an advanced model that is instructed to not learn and also whether we can even define and ensure a model stops learning.
Yes I address this, see for example the part about The possibility of learned self-knowledge in the sequence. I show there that any RL agent, even a non-AGI, will always have the latent ability to ‘look at itself’ and create a machine-learned model of its compute core internals.
What is done with this latent ability is up to the designer. The key thing here is that you have a choice as a designer, you can decide if you want to design an agent which indeed uses this latent ability to ‘look at itself’.
Once you decide that you don’t want to use this latent ability, certain safety/corrigibility problems become a lot more tractable.
Wikipedia has the following definition of AGI:
Though there is plenty of discussion on this forum which silently assumes otherwise, there is no law of nature which says that, when I build a useful AGI-level AI, I must necessarily create the entire package of all human cognitive abilities inside of it.
Terminology note if you want to look into this some more: ML typically does not frame this goal as ‘instructing the model not to learn about Q’. ML would frame this as ‘building the model to approximate the specific relation P(X|Y,Z) between some well-defined observables, and this relation is definitely not Q’.