An actual improvement to say, how Transformers work, would help with speech recognition, language modelling, image recognition, image segmentation, and so on and so forth. Improvements to AI-relevant hardware are a trillion-dollar business. Work compounds so easily on other work that many alignment-concerned people want to conduct all AI research in secret.
This section feels like it misunderstands what Yudkowsky is trying to say here, though I am not confident. I expected this point to not be about “what happens if you find an improvement to transformers in-general” but about “what happens if you find an improvement in your models of the world and the way you solve problems”. The relevant “source code” in this metaphor is not the transformer architecture, but the weights of current neural network, the training process of which is performing a search over a large space of possible programs that could be encoded by different neural weights.
And indeed, the gradient update of one network is approximately useless to the gradient update of another network (you can do some model averaging, but I would take large bets this degrades performance by a lot in large language models). Which makes this seem like an accurate prediction. Indeed one LLM figuring out how to reason better seems unlikely to translate into another LLM thinking better.
This section feels like it misunderstands what Yudkowsky is trying to say here, though I am not confident. I expected this point to not be about “what happens if you find an improvement to transformers in-general” but about “what happens if you find an improvement in your models of the world and the way you solve problems”. The relevant “source code” in this metaphor is not the transformer architecture, but the weights of current neural network, the training process of which is performing a search over a large space of possible programs that could be encoded by different neural weights.
And indeed, the gradient update of one network is approximately useless to the gradient update of another network (you can do some model averaging, but I would take large bets this degrades performance by a lot in large language models). Which makes this seem like an accurate prediction. Indeed one LLM figuring out how to reason better seems unlikely to translate into another LLM thinking better.