Can you visualize an agent that is not “open-ended” in the relevant ways, but is capable of, say, building nanotech and melting all the GPUs?
In my picture most of the extra sauce you’d need on top of GPT-3 looks very agenty. It seems tricky to name “virtual worlds” in which AIs manipulate just “virtual resources” and still manage to do something like melting the GPUs.
maybe a reasonable path forward is to try to wring as much productivity as we can out of the passive, superhuman, quasi-oracular just-dumb-data-predictors. And avoid as much as we can ever creating closed-loop, open-ended, free-rein agents.
I should say that I do see this as a reasonable path forward! But we don’t seem to be coordinating to do this, and AI researchers seem to love doing work on open-ended agents, which sucks.
Hm, regardless it doesn’t really move the needle, so long as people are publishing all of their work. Developing overpowered pattern recognizers is similar to increasing our level of hardware overhang. People will end up using them as components of systems that aren’t safe.
Hm, regardless it doesn’t really move the needle, so long as people are publishing all of their work. Developing overpowered pattern recognizers is similar to increasing our level of hardware overhang. People will end up using them as components of systems that aren’t safe.
I strongly disagree. Gain of function research happens, but it’s rare because people know it’s not safe. To put it mildly, I think reducing the number of dangerous experiments substantially improves the odds of no disaster happening over any given time frame
Can you visualize an agent that is not “open-ended” in the relevant ways, but is capable of, say, building nanotech and melting all the GPUs?
FWIW, I’m not sold on the idea of taking a single pivotal act. But, engaging with what I think is the real substance of the question — can we do complex, real-world, superhuman things with non-agent-y systems?
Yes, I think we can! Just as current language models can be prompt-programmed into solving arithmetic word problems, I think a future system could be led to generate a GPU-melting plan, without it needing to be a utility-maximizing agent.
For a very hand-wavy sketch of how that might go, consider asking GPT-N to generate 1000s of candidate high-level plans, then rate them by feasibility, then break each plan into steps and re-evaluate, etc.
Or, alternatively, imagine the cognitive steps you might take if you were trying to come up with a GPU-melting plan (or alternatively a pivotal act plan in general). Do any of those steps really require that you have a utility function or that you’re a goal-directed agent?
It seems to me that we need some form of search, and discrimination and optimization. But not necessarily anymore than GPT-3 already has. (It would just need to be better at the search. And we’d need to make many many passes through the network to complete all the cognitive steps.)
On your view, what am I missing here?
Is GPT-3 already more of an agent than I realize? (If so, is it dangerous?)
Will GPT-N by default be more of an agent than GPT-3?
Are our own thought processes making use of goal-directedness more than I realize?
Will prompt-programming passive systems hit a wall somewhere?
If so, what are some of the simplest cognitive tasks that we can do that you think such systems wouldn’t be able to do?
For a very hand-wavy sketch of how that might go, consider asking GPT-N to generate 1000s of candidate high-level plans, then rate them by feasibility, then break each plan into steps and re-evaluate, etc
FWIW, I’d call this “weakly agentic” in the sense that you’re searching through some options, but the number of options you’re looking through is fairly small.
It’s plausible that this is enough to get good results and also avoid disasters, but it’s actually not obvious to me. The basic reason: if the top 1000 plans are good enough to get superior performance, they might also be “good enough” to be dangerous. While it feels like there’s some separation between “useful and safe” and “dangerous” plans and this scheme might yield plans all of the former type, I don’t presently see a stronger reason to believe that this is true.
Separately from whether the plans themselves are safe or dangerous, I think the key question is whether the process that generated the plans is trying to deceive you (so it can break out into the real world or whatever).
If it’s not trying to deceive you, then it seems like you can just build in various safeguards (like asking, “is this plan safe?”, as well as more sophisticated checks), and be okay.
Can you visualize an agent that is not “open-ended” in the relevant ways, but is capable of, say, building nanotech and melting all the GPUs?
In my picture most of the extra sauce you’d need on top of GPT-3 looks very agenty. It seems tricky to name “virtual worlds” in which AIs manipulate just “virtual resources” and still manage to do something like melting the GPUs.
I should say that I do see this as a reasonable path forward! But we don’t seem to be coordinating to do this, and AI researchers seem to love doing work on open-ended agents, which sucks.
Hm, regardless it doesn’t really move the needle, so long as people are publishing all of their work. Developing overpowered pattern recognizers is similar to increasing our level of hardware overhang. People will end up using them as components of systems that aren’t safe.
I strongly disagree. Gain of function research happens, but it’s rare because people know it’s not safe. To put it mildly, I think reducing the number of dangerous experiments substantially improves the odds of no disaster happening over any given time frame
FWIW, I’m not sold on the idea of taking a single pivotal act. But, engaging with what I think is the real substance of the question — can we do complex, real-world, superhuman things with non-agent-y systems?
Yes, I think we can! Just as current language models can be prompt-programmed into solving arithmetic word problems, I think a future system could be led to generate a GPU-melting plan, without it needing to be a utility-maximizing agent.
For a very hand-wavy sketch of how that might go, consider asking GPT-N to generate 1000s of candidate high-level plans, then rate them by feasibility, then break each plan into steps and re-evaluate, etc.
Or, alternatively, imagine the cognitive steps you might take if you were trying to come up with a GPU-melting plan (or alternatively a pivotal act plan in general). Do any of those steps really require that you have a utility function or that you’re a goal-directed agent?
It seems to me that we need some form of search, and discrimination and optimization. But not necessarily anymore than GPT-3 already has. (It would just need to be better at the search. And we’d need to make many many passes through the network to complete all the cognitive steps.)
On your view, what am I missing here?
Is GPT-3 already more of an agent than I realize? (If so, is it dangerous?)
Will GPT-N by default be more of an agent than GPT-3?
Are our own thought processes making use of goal-directedness more than I realize?
Will prompt-programming passive systems hit a wall somewhere?
If so, what are some of the simplest cognitive tasks that we can do that you think such systems wouldn’t be able to do?
(See also my similar question here.)
FWIW, I’d call this “weakly agentic” in the sense that you’re searching through some options, but the number of options you’re looking through is fairly small.
It’s plausible that this is enough to get good results and also avoid disasters, but it’s actually not obvious to me. The basic reason: if the top 1000 plans are good enough to get superior performance, they might also be “good enough” to be dangerous. While it feels like there’s some separation between “useful and safe” and “dangerous” plans and this scheme might yield plans all of the former type, I don’t presently see a stronger reason to believe that this is true.
Separately from whether the plans themselves are safe or dangerous, I think the key question is whether the process that generated the plans is trying to deceive you (so it can break out into the real world or whatever).
If it’s not trying to deceive you, then it seems like you can just build in various safeguards (like asking, “is this plan safe?”, as well as more sophisticated checks), and be okay.
>then rate them by feasibility,
I mean, literal GPT is just going to have poor feasibility ratings for novel engineering concepts.
>Do any of those steps really require that you have a utility function or that you’re a goal-directed agent?
Yes, obviously. You have to make many scientific and engineering discoveries, which involves goal-directed investigation.
> Are our own thought processes making use of goal-directedness more than I realize?
Yes, you know which ideas make sense by generalizing from ideas more closely tied in with the actions you take directed towards living.