ESRogs comments on AGI Ruin: A List of Lethalities

ESRogs 8 Jun 2022 23:59 UTC
5 points
3
Can you visualize an agent that is not “open-ended” in the relevant ways, but is capable of, say, building nanotech and melting all the GPUs?
FWIW, I’m not sold on the idea of taking a single pivotal act. But, engaging with what I think is the real substance of the question — can we do complex, real-world, superhuman things with non-agent-y systems?
Yes, I think we can! Just as current language models can be prompt-programmed into solving arithmetic word problems, I think a future system could be led to generate a GPU-melting plan, without it needing to be a utility-maximizing agent.
For a very hand-wavy sketch of how that might go, consider asking GPT-N to generate 1000s of candidate high-level plans, then rate them by feasibility, then break each plan into steps and re-evaluate, etc.
Or, alternatively, imagine the cognitive steps you might take if you were trying to come up with a GPU-melting plan (or alternatively a pivotal act plan in general). Do any of those steps really require that you have a utility function or that you’re a goal-directed agent?
It seems to me that we need some form of search, and discrimination and optimization. But not necessarily anymore than GPT-3 already has. (It would just need to be better at the search. And we’d need to make many many passes through the network to complete all the cognitive steps.)
On your view, what am I missing here?
- Is GPT-3 already more of an agent than I realize? (If so, is it dangerous?)
- Will GPT-N by default be more of an agent than GPT-3?
- Are our own thought processes making use of goal-directedness more than I realize?
- Will prompt-programming passive systems hit a wall somewhere?
  - If so, what are some of the simplest cognitive tasks that we can do that you think such systems wouldn’t be able to do?
  - (See also my similar question here.)
- David Johnston 9 Jun 2022 0:23 UTC
  4 points
  0
  Parent
  For a very hand-wavy sketch of how that might go, consider asking GPT-N to generate 1000s of candidate high-level plans, then rate them by feasibility, then break each plan into steps and re-evaluate, etc
  FWIW, I’d call this “weakly agentic” in the sense that you’re searching through some options, but the number of options you’re looking through is fairly small.
  It’s plausible that this is enough to get good results and also avoid disasters, but it’s actually not obvious to me. The basic reason: if the top 1000 plans are good enough to get superior performance, they might also be “good enough” to be dangerous. While it feels like there’s some separation between “useful and safe” and “dangerous” plans and this scheme might yield plans all of the former type, I don’t presently see a stronger reason to believe that this is true.
  - ESRogs 9 Jun 2022 0:58 UTC
    6 points
    1
    Parent
    Separately from whether the plans themselves are safe or dangerous, I think the key question is whether the process that generated the plans is trying to deceive you (so it can break out into the real world or whatever).
    If it’s not trying to deceive you, then it seems like you can just build in various safeguards (like asking, “is this plan safe?”, as well as more sophisticated checks), and be okay.
- TekhneMakre 9 Jun 2022 0:16 UTC
  2 points
  0
  Parent
  >then rate them by feasibility,
  I mean, literal GPT is just going to have poor feasibility ratings for novel engineering concepts.
  >Do any of those steps really require that you have a utility function or that you’re a goal-directed agent?
  Yes, obviously. You have to make many scientific and engineering discoveries, which involves goal-directed investigation.
  > Are our own thought processes making use of goal-directedness more than I realize?
  Yes, you know which ideas make sense by generalizing from ideas more closely tied in with the actions you take directed towards living.