we make the very strong assumption throughout that S-LLMs are a plausible and likely path to AGI
It sounds unlikely and unnecessarily strong to say that we can reach AGI by scaffolding alone (if that’s what you mean). But I think it’s pretty likely that AGI will involve some amount of scaffolding, and that it will boost its capabilities significantly.
there is a preexisting discrepancy between how humans would interpret phrases and how the base model will interpret them
To the extent that it’s true, I expect that it may also make deception easier to arise. This discrepancy may serve as a seed of deception.
Systems engaging in self modification will make the interpretation of their natural language data more challenging.
Why? Sure, they will get more complex, but are there any other reasons?
Also, I like the richness of your references in this post :)
1. I do think that it is at least plausible (5-25%?) that we could obtain general intelligence via improved scaffolding, or at least obtain a self improving seed model that would eventually lead to AGI. Current systems like Voyager do not have that many “moving parts”. I suspect that there is a rich design space for capabilities researchers to explore if they keep pushing in this direction.
Keep in mind that the current “cutting edge” for scaffold design consists of relatively rudimentary ideas like “don’t use the expensive LLM for everything”. When I see scaffolds leading to AGI I an envisioning a complex web of interacting components that requires a fair bit of effort to understand and build.
2. I think I agree although I’m a bit unclear on what the specifics of the “seed of deception”. My intention was to highlight that there are natural language phrases or words whose meaning is already challenging to interpret.
3. It’s not just that they’re more complex it may also be that they might start utilizing channels and subsystems in unusual ways.
Perhaps a system notices that the vector database it has been assigned as a “memory” is quite small, but it also has read and write access to another vector database intended for logs.
Perhaps a system notices that the vector database it has been assigned as a “memory” is quite small, but it also has read and write access to another vector database intended for logs.
It’s clear to me that we can easily prevent this type of behaviour. First of all, the system must not have read access to logs. But in general, the decision to read the memory and write logs should be explicit and trasparent parts of the scaffolding and the system shouldn’t be able to “introspect” on its own.
But if something can be easily prevented it doesn’t mean that it will be, unless we actually do the effort. We need to think about more of such cases and develop safety protocols for LLM-based agents.
I agree that scaffolding can take us a long way towards AGI, but I’d be very surprised if GPT4 as core model was enough.
Yup, that wasn’t a critique, I just wanted to note something. By “seed of deception” I mean that the model may learn to use this ambiguity more and more, if that’s useful for passing some evals, while helping it do some computation unwanted by humans.
I see, so maybe in ways which are weird to humans to think about.
Leaving this comment to make a public prediction that I expect GPT4 to be enough for about human level AGI with the propper scaffolding with more than 50% confidence.
It sounds unlikely and unnecessarily strong to say that we can reach AGI by scaffolding alone (if that’s what you mean). But I think it’s pretty likely that AGI will involve some amount of scaffolding, and that it will boost its capabilities significantly.
To the extent that it’s true, I expect that it may also make deception easier to arise. This discrepancy may serve as a seed of deception.
Why? Sure, they will get more complex, but are there any other reasons?
Also, I like the richness of your references in this post :)
Hello and thank you for the good questions.
1. I do think that it is at least plausible (5-25%?) that we could obtain general intelligence via improved scaffolding, or at least obtain a self improving seed model that would eventually lead to AGI. Current systems like Voyager do not have that many “moving parts”. I suspect that there is a rich design space for capabilities researchers to explore if they keep pushing in this direction.
Keep in mind that the current “cutting edge” for scaffold design consists of relatively rudimentary ideas like “don’t use the expensive LLM for everything”. When I see scaffolds leading to AGI I an envisioning a complex web of interacting components that requires a fair bit of effort to understand and build.
2. I think I agree although I’m a bit unclear on what the specifics of the “seed of deception”. My intention was to highlight that there are natural language phrases or words whose meaning is already challenging to interpret.
3. It’s not just that they’re more complex it may also be that they might start utilizing channels and subsystems in unusual ways.
Perhaps a system notices that the vector database it has been assigned as a “memory” is quite small, but it also has read and write access to another vector database intended for logs.
It’s clear to me that we can easily prevent this type of behaviour. First of all, the system must not have read access to logs. But in general, the decision to read the memory and write logs should be explicit and trasparent parts of the scaffolding and the system shouldn’t be able to “introspect” on its own.
But if something can be easily prevented it doesn’t mean that it will be, unless we actually do the effort. We need to think about more of such cases and develop safety protocols for LLM-based agents.
I agree that scaffolding can take us a long way towards AGI, but I’d be very surprised if GPT4 as core model was enough.
Yup, that wasn’t a critique, I just wanted to note something. By “seed of deception” I mean that the model may learn to use this ambiguity more and more, if that’s useful for passing some evals, while helping it do some computation unwanted by humans.
I see, so maybe in ways which are weird to humans to think about.
Leaving this comment to make a public prediction that I expect GPT4 to be enough for about human level AGI with the propper scaffolding with more than 50% confidence.