Current LLMs require huge amounts of data and compute to be trained.
Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it’s possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models “from scratch” is because Humans don’t have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that’s the bottleneck when it comes to interpretability?
Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then… foom!
A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.
Yes, exactly this.
While it’s true that this could require “a lot of compute-intensive experiments,” that’s not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do “Alignment” on other LLMs, as part of their Super Alignment project.
As part of this process, we can expect the Alignment LLM to be “running a lot of compute-intensive experiments” on another LLM. And, the Humans are not likely to have any idea what those “compute-intensive experiments” are doing? They could also be adjusting the other LLM’s weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the “Training” LLM… and back and forth, and… foom!
Super-human LLMs running RL(M)F and “alignment” on other LLMs, using only “synthetic” training data.… What could go wrong?
Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it’s possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models “from scratch” is because Humans don’t have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that’s the bottleneck when it comes to interpretability?
Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then… foom!
Yes, exactly this.
While it’s true that this could require “a lot of compute-intensive experiments,” that’s not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do “Alignment” on other LLMs, as part of their Super Alignment project.
As part of this process, we can expect the Alignment LLM to be “running a lot of compute-intensive experiments” on another LLM. And, the Humans are not likely to have any idea what those “compute-intensive experiments” are doing? They could also be adjusting the other LLM’s weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the “Training” LLM… and back and forth, and… foom!
Super-human LLMs running RL(M)F and “alignment” on other LLMs, using only “synthetic” training data.…
What could go wrong?