They’re folk theorems, not conjectures. The demonstration is that, in principle, you can go on reducing the losses at prediction of human-generated text by spending more and more and more intelligence, far far past the level of human intelligence or even what we think could be computed by using all the negentropy in the reachable universe. There’s no realistic limit on required intelligence inherent in the training problem; any limits on the intelligence of the system come from the limitations of the trainer, not the loss being minimized as far as theoretically possible by a moderate level of intelligence. If this isn’t mathematically self-evident then you have not yet understood what’s being stated.
No, I didn’t understand what you said. It seemed like you simplified ML systems with a look up table in #1. In #2, it seems like you know what exactly is used to train these systems, and somehow papers before or after 2010 is of meaningful indicators for ML systems, which I don’t know where the reasoning came from. My apologies for not being knowledgeable in this area.
The two examples were (mostly) unrelated and served to demonstrate two cases where a perfect text predictor needs to do incredibly complex calculation to correctly predict text. Thus a perfect text predictor is vast superintelligence (and we won’t achieve perfect text prediction, but as we get better and better we might get closer to superintelligence)
In the first case, if the training data contains series of [hash] then [plain text], then a correct predictor must be able to retrieve the plain text from the hash (and because there are multiple plain texts with the same hash, it would have to calculate through all of them and evaluate which is most probable to appear). Thus correctly predicting text can mean being able to calculate an incredibly large amount of hashes on all combinations of text of certain lengths and evaluating which is the most probable.
In the second case, the task is to predict future papers based on past papers, which is kinda obviously very hard.
It doesn’t seem clear to me what those two demonstrations are trying to test. 1 seems like a case of over-fitting. 2 seems like an extension of 1 except it’s the case with papers, not sure how the papers case has anything to do with the generalized capabilities of ChatGPT. If you think ChatGPT is merely a complex lookup-table, then I don’t really know what to say. Lookup-table or NLP, I don’t know how either has much to do with general intelligence. Both are models that may seem intelligent if that’s where the discussion is focusing on. Honestly, I don’t really understand a lot of the stuff discussed on this site.
They’re folk theorems, not conjectures. The demonstration is that, in principle, you can go on reducing the losses at prediction of human-generated text by spending more and more and more intelligence, far far past the level of human intelligence or even what we think could be computed by using all the negentropy in the reachable universe. There’s no realistic limit on required intelligence inherent in the training problem; any limits on the intelligence of the system come from the limitations of the trainer, not the loss being minimized as far as theoretically possible by a moderate level of intelligence. If this isn’t mathematically self-evident then you have not yet understood what’s being stated.
No, I didn’t understand what you said. It seemed like you simplified ML systems with a look up table in #1. In #2, it seems like you know what exactly is used to train these systems, and somehow papers before or after 2010 is of meaningful indicators for ML systems, which I don’t know where the reasoning came from. My apologies for not being knowledgeable in this area.
The two examples were (mostly) unrelated and served to demonstrate two cases where a perfect text predictor needs to do incredibly complex calculation to correctly predict text. Thus a perfect text predictor is vast superintelligence (and we won’t achieve perfect text prediction, but as we get better and better we might get closer to superintelligence)
In the first case, if the training data contains series of [hash] then [plain text], then a correct predictor must be able to retrieve the plain text from the hash (and because there are multiple plain texts with the same hash, it would have to calculate through all of them and evaluate which is most probable to appear). Thus correctly predicting text can mean being able to calculate an incredibly large amount of hashes on all combinations of text of certain lengths and evaluating which is the most probable.
In the second case, the task is to predict future papers based on past papers, which is kinda obviously very hard.
It doesn’t seem clear to me what those two demonstrations are trying to test. 1 seems like a case of over-fitting. 2 seems like an extension of 1 except it’s the case with papers, not sure how the papers case has anything to do with the generalized capabilities of ChatGPT. If you think ChatGPT is merely a complex lookup-table, then I don’t really know what to say. Lookup-table or NLP, I don’t know how either has much to do with general intelligence. Both are models that may seem intelligent if that’s where the discussion is focusing on. Honestly, I don’t really understand a lot of the stuff discussed on this site.