Hey Tom, thanks again for your work creating the initial report and for kicking off this discussion. Apologies for the Christmastime delay in reply.
Two quick responses, focused on points of disagreement that aren’t stressed in my original text.
On AI-Generated Synthetic Data:
Breakthroughs in synthetic data would definitely help overcome my dataset quality concerns. Two main obstacles I’d want to see overcome: How will synthetic data retain (1) the fidelity to individual data points of ground truth (how well it represents the “real world” its simulation prepares models for) and (2) the higher-level distribution of datapoints.
On Abstraction with Scale:
Understanding causality deeply would definitely be useful for predicting next words. However, I don’t think that this potential utility implies that current models have such understanding. It might mean that algorithmic innovations that “figure this out” will outcompete others, but that time might still be to-come.
I agree, though, that performance definitely improves with scale and more data collection/feedback when deployed more frequently. Time will tell the level of sophistication to which scale can take us on its own?
On the latter two points (GDP Growth and Parallelization), the factors you flag are definitely also parts of the equation. A higher percentage of GDP invested can increase total investment even if total GDP remains level. Additional talent coming into AI helps combat diminishing returns on the next researcher up, even given duplicative efforts and bad investments.
Hey Tom, thanks again for your work creating the initial report and for kicking off this discussion. Apologies for the Christmastime delay in reply.
Two quick responses, focused on points of disagreement that aren’t stressed in my original text.
On AI-Generated Synthetic Data:
Breakthroughs in synthetic data would definitely help overcome my dataset quality concerns. Two main obstacles I’d want to see overcome: How will synthetic data retain (1) the fidelity to individual data points of ground truth (how well it represents the “real world” its simulation prepares models for) and (2) the higher-level distribution of datapoints.
On Abstraction with Scale:
Understanding causality deeply would definitely be useful for predicting next words. However, I don’t think that this potential utility implies that current models have such understanding. It might mean that algorithmic innovations that “figure this out” will outcompete others, but that time might still be to-come.
I agree, though, that performance definitely improves with scale and more data collection/feedback when deployed more frequently. Time will tell the level of sophistication to which scale can take us on its own?
On the latter two points (GDP Growth and Parallelization), the factors you flag are definitely also parts of the equation. A higher percentage of GDP invested can increase total investment even if total GDP remains level. Additional talent coming into AI helps combat diminishing returns on the next researcher up, even given duplicative efforts and bad investments.