Roman Leventov comments on Views on when AGI comes and on strategy to reduce existential risk

Roman Leventov 10 Jul 2023 8:50 UTC
5 points
2
On the other hand, if it hasn’t been trained on a bunch of statements about angular momentum, and then it can—given some examples and time to think—correctly answer questions about angular momentum, that would be surprising and impressive. Maybe this could be experimentally tested, though I guess at great cost, by training a LLM on a dataset that’s been scrubbed of all mention of stuff related to angular momentum (disallowing math about angular momentum, but allowing math and discussion about momentum and about rotation), and then trying to prompt it so that it can correctly answer questions about angular momentum. Like, the point here is that angular momentum is a “new thing under the sun” in a way that “red and smaller than microwave” is not a new thing under the sun.
Until recently, I thought that the fact that LLMs are not strong and efficient online (or quasi-online, i.e., need few examples) conceptual learners is a “big obstacle” for AGI or ASI. I no longer think so. Yes, humans evidently still have an edge in this, that is, humans can somehow relatively quickly and efficiently “surgeon” their world models to accommodate new concepts and use them efficiently in a far-ranging way. (Even though I suspect that we over-glorify this ability in humans and it more realistically takes weeks or even months for humans to fully integrate new conceptual frameworks in their thinking than hours, still, they should be able to do so without much external examples, which will be lacking if the concept is actually very new.)
I no longer think this handicaps LLMs much. New powerful concepts that permeate practical and strategic reasoning in the real world are rarely invented and are spread through the society slowly. Just being a skillful user of existing concepts that are amptly described in books and otherwise in the training corpus of LLMs should be well enough for gaining capacity for recursive self-improvement, and quite far superhuman intelligence/strategy/agency more generally.
Then, imagine that superhuman LLMs-based agents “won” and killed all humans. Even if they themselves don’t (or couldn’t!) invent ML paradigms for efficient online concept learning, they could still sort of hack through it, through experimenting with new concepts, trying to run a lot of simulations with them, checking these simulations against reality (filtering out incoherence/bad concepts), and then re-training themselves on the results of these simulations, and then giving text labels to the features found in their own DNNs to mark the corresponding concept.
- TsviBT 10 Jul 2023 22:54 UTC
  4 points
  0
  Parent
  
  Just being a skillful user of existing concepts
  
  I don’t think they’re skilled users of existing concepts. I’m not saying it’s an “obstacle”, I’m saying that this behavior pattern would be a significant indicator to me that the system has properties that make it scary.