This is a slight point in favor of Yudkowsky: thinking is cheap, finding the right algorithm (including weights) is expensive. Right now we’re brute-forcing the discovery of this algorithm using a LOT of data, and maybe it’s impossible to do any better than brute-forcing. (Well, the human brain can do it, but I’ll ignore that.)
Could you run a LLM on a desktop from 2008? No. But, once the algorithm is “discovered” by a large computer it’s being run on consumer hardware instead of supercomputers, and I think that points towards Yudkowsky’s gesture at running AI on consumer hardware rather than Hanson’s gesture at Watson and other programs run on supercomputers.
If there really is no better way to find AI minds than brute-forcing the training of billions of parameters on a trillion tokens, then that points in the direction of Hanson, but I don’t really think that this would have been an important crux for either of them. (And I don’t really think that there aren’t more efficient ways of training.)
On the whole, I think this is more of a wash than a point for Hanson.
So, like, I remain pretty strongly pro Hanson on this point:
I think LLaMA 7b is very cool, but it’s really stretching it to call it a state-of-the-art language model. It’s much worse than LLaMA 65b, which much worse than GPT-4, which most people think is > 100b as far as I know. I’m using a 12b model right now while working on an interpretability project… and it is just much, much dumber than these big ones.
Not being able to train isn’t a small deal, I think. Learning in a long-term way is a big part of intelligence.
Overall, and not to be too glib, I don’t see why fitting a static and subhuman mind into consumer hardware from 2023 means that Yudkowsky doesn’t lose points for saying you can fit a learning (implied) and human-level mind into consumer hardware from 2008.
I don’t see why fitting a static and subhuman mind into consumer hardware from 2023 means that Yudkowsky doesn’t lose points for saying you can fit a learning (implied) and human-level mind into consumer hardware from 2008.
Because one has nothing to do with the other. LLMs are getting bigger and bigger, but that says nothing about whether a mind designed algorithmically could fit on consumer hardware.
I think an important point missing from the discussion on compute is training vs inference: you can totally get a state-of-the-art language model performing inference on a laptop.
This is a slight point in favor of Yudkowsky: thinking is cheap, finding the right algorithm (including weights) is expensive. Right now we’re brute-forcing the discovery of this algorithm using a LOT of data, and maybe it’s impossible to do any better than brute-forcing. (Well, the human brain can do it, but I’ll ignore that.)
Could you run a LLM on a desktop from 2008? No. But, once the algorithm is “discovered” by a large computer it’s being run on consumer hardware instead of supercomputers, and I think that points towards Yudkowsky’s gesture at running AI on consumer hardware rather than Hanson’s gesture at Watson and other programs run on supercomputers.
If there really is no better way to find AI minds than brute-forcing the training of billions of parameters on a trillion tokens, then that points in the direction of Hanson, but I don’t really think that this would have been an important crux for either of them. (And I don’t really think that there aren’t more efficient ways of training.)
On the whole, I think this is more of a wash than a point for Hanson.
So, like, I remain pretty strongly pro Hanson on this point:
I think LLaMA 7b is very cool, but it’s really stretching it to call it a state-of-the-art language model. It’s much worse than LLaMA 65b, which much worse than GPT-4, which most people think is > 100b as far as I know. I’m using a 12b model right now while working on an interpretability project… and it is just much, much dumber than these big ones.
Not being able to train isn’t a small deal, I think. Learning in a long-term way is a big part of intelligence.
Overall, and not to be too glib, I don’t see why fitting a static and subhuman mind into consumer hardware from 2023 means that Yudkowsky doesn’t lose points for saying you can fit a learning (implied) and human-level mind into consumer hardware from 2008.
Because one has nothing to do with the other. LLMs are getting bigger and bigger, but that says nothing about whether a mind designed algorithmically could fit on consumer hardware.