We are definitely not training AIs on human thoughts because language is an expression of thought, not thought itself. Otherwise nobody would struggle to express their thoughts in language.
My favorite fictional analog of LLMs is Angels from Evangelion. Relatives, yes, but utterly alien relatives.
We are definitely not training AIs on human thoughts because language is an expression of thought, not thought itself.
Even if training on language was not equivalent to training on thoughts, that would also apply to humans.
But it also seems false in the same way that “we are definitely not training AI’s on reality because image files are compressed sampled expressions of images, not reality itself” is false.
Approximate bayesian inference (ie DL) can infer the structure of a function through its outputs; the structure of the 3D world through images; and thoughts through language.
My point is not “language is a different form of thought”, it’s “most thoughts are not even expressed in language”. And “being someone who can infer physics from images is a very different from being physics”.
How is that even remotely relevant? Humans and AIs learn the same way, via language—and its not like this learning process fails just because language undersamples thoughts.
We could include a lot of detailed EEG traces (with speech and video) in the pretraining set, as another modality. I’m not sure doing so would help, but it might. Certainly it would make them better at reading our minds via an EEG.
We are definitely not training AIs on human thoughts because language is an expression of thought, not thought itself. Otherwise nobody would struggle to express their thoughts in language.
My favorite fictional analog of LLMs is Angels from Evangelion. Relatives, yes, but utterly alien relatives.
Even if training on language was not equivalent to training on thoughts, that would also apply to humans.
But it also seems false in the same way that “we are definitely not training AI’s on reality because image files are compressed sampled expressions of images, not reality itself” is false.
Approximate bayesian inference (ie DL) can infer the structure of a function through its outputs; the structure of the 3D world through images; and thoughts through language.
My point is not “language is a different form of thought”, it’s “most thoughts are not even expressed in language”. And “being someone who can infer physics from images is a very different from being physics”.
How is that even remotely relevant? Humans and AIs learn the same way, via language—and its not like this learning process fails just because language undersamples thoughts.
We could include a lot of detailed EEG traces (with speech and video) in the pretraining set, as another modality. I’m not sure doing so would help, but it might. Certainly it would make them better at reading our minds via an EEG.