Gerald Monroe comments on A smart enough LLM might be deadly simply if you run it for long enough

Gerald Monroe 24 Apr 2023 22:23 UTC
0 points
−2
smart enough LLM, even if it’s myopic

I am assuming by myopic you mean the model weights are frozen, similar to gpt-4 as it is today.

The fundamental issue with this is that the maximum capability the model can exhibit is throttled by the maximum number of tokens that can fit in the context window.

You can think of some of those tokens as pulling out a Luigi out of superposition to be maximally effective at a task (“think it through step by step, are you sure, express reasoning before answer”) and some have to contain context for the current subtask.

Issue is it just caps, you probably can’t express enough information this way for the model to “not miss” so to speak. It will keep making basic errors forever as it cannot learn from it’s mistakes, and anything in the prompt to prevent that error costs a more valuable token.

You can think of every real world tasks as having all sorts of hidden “gotchas” and edge cases that are illogical. The DNA printer needs a different format for some commands, the stock trading interface breaks the UI conventions in a couple of key places, humans keep hiding from your killer robots with the same trick that works every time.

Obviously a model that can update weights as it performs tasks, especially when a testable prediction, outcome pair comes as a natural result of the model accessing tools, won’t have this issue. Already Nvidia is offering models that end customers will be able to train with unlocked weights, so this limitation will be brief.
- Mikhail Samin 24 Apr 2023 22:43 UTC
  1 point
  0
  Parent
  By myopic I mean https://www.lesswrong.com/tag/myopia — that it was trained to predict the next token and doesn’t get much lower loss from having goals about anything longer-term than predicting the next token correctly.
  
  I assume the weights are frozen, I’m surprised to see this as a question.
  
  Some quick replies from the top of my head: If GPT-7 has a much larger context window; or if there are kinds of prompts the dynamic converges to that aren’t too long; and you get an AGI that’s smart and goal-oriented and needs to spend some of the space that it has to support its level (or it naturally happens, because the model continues to output what an AGI that smart would be doing), and if how smart an AGI simulated by that LLM might be isn’t capped at some low level, I don’t think there’s any issue with it using notes until it can has access to something outside, that allows it to be more of a AutoGPT with external memory and everything. If it utilises the model’s knowledge, it might figure out what text it can output that hacks the server where the text is stored and processed; or it can understand humans and design a text that hacks their brains when they look at it.