I think the whole point in AI research is to do something, not find out how humans do something.
Depends on who’s doing the research and why. You’re right that companies that want to sell software care about solving the problem, which is why that type of approach is so common. On the other hand, I’m reluctant to call a mostly brute-forced solution “AI research”, even if it’s useful computer programming.
When mysterious things cease to be mysterious they’ll tend to resemble the way “X”.
No, I think you’re missing my point. X is uninteresting not because it is no longer mysterious, but because it has no large-scale structure and patterns. We could consider another novel-writing program Z that writes novels in some other interesting and complicated way that’s different than how humans do it, but still has a rich and detailed structure.
Continuing with the flight analogy: rockets, helicopters, planes, and birds all have interesting ways of flying, whereas the “brute force” approach to flight, throwing a rock really really hard, is not that interesting.
Another example: optical character recognition. One approach is to have a database of hundreds of different fonts, put a grid on each character from each font, and come up with a statistical measure that figures out how close the scanned image is to each stored character by looking at the pixels that they have in common. This works and produces useful software, but that approach doesn’t actually care about the different letterforms and shapes involved with them. It doesn’t recognize that structure, even though that’s what the problem is about.
Arguably, OCR is about taking a small patch of an image and matching that to a finite set of candidate possible ground truths. OCR programs can do this sometimes better than most humans, if the only thing you look at is one distorted character.
OCR has traditionally been a difficult problem and there are some novel applications of statistics and heuristics used to solve it. But OCR is not what we actually care about: the problem is recognizing a document, or symbolically representing a sentence, and OCR is just one small problem we’ve carved out to help us deal with the larger problem.
Characters are important when they are part of words, and the structure of a document. They are important when they contribute to what the document means, beyond just the raw data of the image scan. Situating a character in the context of the word it’s in, the sentence that word is in, and the context of the document (English novel, handwritten letter from the 18th century, hastily scribbled medical report from a German hospital in 1970′s) is what allows a human to extrapolate what the character must be, even if the image of the original character is distorted beyond any algorithm’s ability to recognize, or even obliterated entirely.
It’s this effect of context which is hard to capture and encode into an OCR algorithm. This broader sense, of being able to recognize a character anywhere a human would, which is the end goal of the problem, is what my friends refer to as an AI-complete problem. (Apologies if this community also uses that phrase, I haven’t yet seen it here on LW.)
To give a specific example, many doctors use the symbol “circle above a cross” to indicate female, which most people reading would understand. Why? We’ve seen that symbol before, perhaps many times, and understand what it means. If you’ve trained your OCR algorithm on the standard set of English alphanumeric characters, then it will attempt to match that symbol and come up with the wrong answer. If you’ve done unsupervised training of an OCR algorithm on a typical novel, magazine, and newspaper corpus, there is a good chance that the symbol for female does not appear as a cluster in its vector space.
In order to recognize that symbol as a distinct symbol that needs to be somehow represented in the output, an OCR algorithm would have to do unsupervised online learning as it’s scanning documents in a new domain. Even then, I’m not sure how useful it would be, since the problem is not recognizing a given character. The problem is recognizing what that character should be given the context of the document you’re scanning. The problem of OCR explodes into specializations of “OCR for novels, OCR for 18th century English letters, OCR for American hospitals”, and even more.
If we want an OCR algorithm to output something more useful than [funky new character I found], and instead insert “female” into the text database, at some point we have to tell the algorithm about the character. There’s not yet that I know of an OCR system that avoids this hard truth.
I like “AI-complete”, though it wouldn’t surprise me if general symbol recognition and interpretation is easier than natural language, whereas all NP-complete problems are equivalent.
I kept my initial comment technical, without delving into the philosophical aspects of it, but now I can ramble a bit.
I suspect that general symbol recognition and interpretation is AI-complete, because of these issues of context, world knowledge, and quasi-unsupervised online learning.
I believe there is a generalized learning algorithm (or set of algorithms) that use (at minimum) frequencies and in-built biological heuristics that we use to approach the world. In this view, natural language generation and understanding is one manifestation of this more general learning system (or constantly updating pattern recognition, if you like, though I think there may be more to it than simple recognition). Symbol recognition and interpretation is another.
“Recognition” and “interpretation” are themselves slippery words that hide the how and the what of what it is we do when we see a symbol. Computational linguists and psycholinguistics have done a good job of demonstrating that we know very little of what we’re actually doing when we process visual and auditory input.
You are right that AI-complete probably hides finer levels of equivalency classes, wrapped up in the messy issue of what we mean by intelligence. Still, it’s a handy shorthand to refer to problems that may require this more general learning facility, about which we understand very little.
Depends on who’s doing the research and why. You’re right that companies that want to sell software care about solving the problem, which is why that type of approach is so common. On the other hand, I’m reluctant to call a mostly brute-forced solution “AI research”, even if it’s useful computer programming.
No, I think you’re missing my point. X is uninteresting not because it is no longer mysterious, but because it has no large-scale structure and patterns. We could consider another novel-writing program Z that writes novels in some other interesting and complicated way that’s different than how humans do it, but still has a rich and detailed structure.
Continuing with the flight analogy: rockets, helicopters, planes, and birds all have interesting ways of flying, whereas the “brute force” approach to flight, throwing a rock really really hard, is not that interesting.
Another example: optical character recognition. One approach is to have a database of hundreds of different fonts, put a grid on each character from each font, and come up with a statistical measure that figures out how close the scanned image is to each stored character by looking at the pixels that they have in common. This works and produces useful software, but that approach doesn’t actually care about the different letterforms and shapes involved with them. It doesn’t recognize that structure, even though that’s what the problem is about.
Arguably, OCR is about taking a small patch of an image and matching that to a finite set of candidate possible ground truths. OCR programs can do this sometimes better than most humans, if the only thing you look at is one distorted character.
OCR has traditionally been a difficult problem and there are some novel applications of statistics and heuristics used to solve it. But OCR is not what we actually care about: the problem is recognizing a document, or symbolically representing a sentence, and OCR is just one small problem we’ve carved out to help us deal with the larger problem.
Characters are important when they are part of words, and the structure of a document. They are important when they contribute to what the document means, beyond just the raw data of the image scan. Situating a character in the context of the word it’s in, the sentence that word is in, and the context of the document (English novel, handwritten letter from the 18th century, hastily scribbled medical report from a German hospital in 1970′s) is what allows a human to extrapolate what the character must be, even if the image of the original character is distorted beyond any algorithm’s ability to recognize, or even obliterated entirely.
It’s this effect of context which is hard to capture and encode into an OCR algorithm. This broader sense, of being able to recognize a character anywhere a human would, which is the end goal of the problem, is what my friends refer to as an AI-complete problem. (Apologies if this community also uses that phrase, I haven’t yet seen it here on LW.)
To give a specific example, many doctors use the symbol “circle above a cross” to indicate female, which most people reading would understand. Why? We’ve seen that symbol before, perhaps many times, and understand what it means. If you’ve trained your OCR algorithm on the standard set of English alphanumeric characters, then it will attempt to match that symbol and come up with the wrong answer. If you’ve done unsupervised training of an OCR algorithm on a typical novel, magazine, and newspaper corpus, there is a good chance that the symbol for female does not appear as a cluster in its vector space.
In order to recognize that symbol as a distinct symbol that needs to be somehow represented in the output, an OCR algorithm would have to do unsupervised online learning as it’s scanning documents in a new domain. Even then, I’m not sure how useful it would be, since the problem is not recognizing a given character. The problem is recognizing what that character should be given the context of the document you’re scanning. The problem of OCR explodes into specializations of “OCR for novels, OCR for 18th century English letters, OCR for American hospitals”, and even more.
If we want an OCR algorithm to output something more useful than [funky new character I found], and instead insert “female” into the text database, at some point we have to tell the algorithm about the character. There’s not yet that I know of an OCR system that avoids this hard truth.
I like “AI-complete”, though it wouldn’t surprise me if general symbol recognition and interpretation is easier than natural language, whereas all NP-complete problems are equivalent.
I kept my initial comment technical, without delving into the philosophical aspects of it, but now I can ramble a bit.
I suspect that general symbol recognition and interpretation is AI-complete, because of these issues of context, world knowledge, and quasi-unsupervised online learning.
I believe there is a generalized learning algorithm (or set of algorithms) that use (at minimum) frequencies and in-built biological heuristics that we use to approach the world. In this view, natural language generation and understanding is one manifestation of this more general learning system (or constantly updating pattern recognition, if you like, though I think there may be more to it than simple recognition). Symbol recognition and interpretation is another.
“Recognition” and “interpretation” are themselves slippery words that hide the how and the what of what it is we do when we see a symbol. Computational linguists and psycholinguistics have done a good job of demonstrating that we know very little of what we’re actually doing when we process visual and auditory input.
You are right that AI-complete probably hides finer levels of equivalency classes, wrapped up in the messy issue of what we mean by intelligence. Still, it’s a handy shorthand to refer to problems that may require this more general learning facility, about which we understand very little.