How many symbols are there for it to eat? Are there enough to give the same depth of understanding that a human gets from processing spatial info for instance?
Yes. It’s not the case that humans blind from birth are dramatically less intelligent, learning from sound and touch is sufficient. LLMs are much less data efficient with respect to external data, because they only learn external data. For a human mind, most data it learns is probably to a large extent self-generated, synthetic, so only having access to much less external data is not a big issue. For LLMs, there aren’t yet general ways of generating synthetic data that can outright compensate for scarcity of external data and improve their general intelligence the way natural text data does, instead of propping up particular narrow capabilities (and hoping for generalization).
It seems to me that there are arguments to be made in both directions.[1] It’s not clear to me just yet which stance is correct. Maybe yours is! I don’t know.
My point is that it’s understandable for intelligent people to suspect that there isn’t enough data available yet to produce ASI on the current approach. You might disagree, and maybe your disagreement is even correct, but I don’t think the situation is so vividly clear that it’s incomprehensible why many people wouldn’t be persuaded.
As a quick gesture at the point: as far as I know, all the data LLMs are processing has already gone through a processing filter, namely humans. We produced all the tokens they took in as training data. A newborn, even blind, doesn’t have this limitation — and I’d expect a newborn that was given this limitation somehow very much could have stunted intelligence! I think the analog would be less like a blind newborn and more like a numb one, without tactile or proprioceptive senses.
For a human mind, most data it learns is probably to a large extent self-generated, synthetic, so only having access to much less external data is not a big issue.
Could you say more about this? What do you think is the ratio of external to internal data?
Yes. It’s not the case that humans blind from birth are dramatically less intelligent, learning from sound and touch is sufficient. LLMs are much less data efficient with respect to external data, because they only learn external data. For a human mind, most data it learns is probably to a large extent self-generated, synthetic, so only having access to much less external data is not a big issue. For LLMs, there aren’t yet general ways of generating synthetic data that can outright compensate for scarcity of external data and improve their general intelligence the way natural text data does, instead of propping up particular narrow capabilities (and hoping for generalization).
It seems to me that there are arguments to be made in both directions.[1] It’s not clear to me just yet which stance is correct. Maybe yours is! I don’t know.
My point is that it’s understandable for intelligent people to suspect that there isn’t enough data available yet to produce ASI on the current approach. You might disagree, and maybe your disagreement is even correct, but I don’t think the situation is so vividly clear that it’s incomprehensible why many people wouldn’t be persuaded.
As a quick gesture at the point: as far as I know, all the data LLMs are processing has already gone through a processing filter, namely humans. We produced all the tokens they took in as training data. A newborn, even blind, doesn’t have this limitation — and I’d expect a newborn that was given this limitation somehow very much could have stunted intelligence! I think the analog would be less like a blind newborn and more like a numb one, without tactile or proprioceptive senses.
Could you say more about this? What do you think is the ratio of external to internal data?