If you could build an AI that did nothing but parse published articles to answer the question, “Has anyone said X?”, that would be very useful, and very safe. I worked on such a program (SemRep) at NIH. It works pretty well within the domain of medical journal articles.
If it could take one step more, and ask, “Can you find a set of one to four statements that, taken together, imply X?”, that would be a huge advance in capability, with little if any additional risk.
I added that capability to SemRep, but no one has ever used it, and it isn’t accessible through the web interface. (I introduced a switch that makes it dump its output as structured Prolog statements instead of as a flat file; you can then load them into a Prolog interpreter and ask queries, and it will perform Prolog inference.) In fact, I don’t think anyone else is aware that capability exists; my former boss thought it was a waste of time and was angry with me for having spent a day implementing it, and has probably forgotten about it. It needs some refinement to work properly, because a search of, say, 100,000 article abstracts will find many conflicting statements. It needs to pick one of “A / not A” for every A found directly in an article, based on the number of and quality of assertions found in favor of each.
How close to you have to get to natural language to do the search?
I’ve wondered whether a similar system could check legal systems for contradictions—probably a harder problem, but not as hard as full natural language.
Most of the knowledge used, is in its ontology. It doesn’t try to parse sentences with categories like {noun, verb, adverb}; it uses categories like {drug, disease, chemical, gene, surgery, physical therapy}. It doesn’t categorize verbs as {transitive, intransitive, etc.}; it categorizes verbs as eg {increases, decreases, is-a-symptom-of}. When you build a grammar (by hand) out of word categories that are this specific, it makes most NLP problems disappear.
ADDED: It isn’t really a grammar, either—it grabs onto the most-distinctive simple pattern first, which might be the phrase “is present in”, and then says, “Somewhere to the left I’ll probably find a symptom, and somewhere to the right I’ll probably find a disease”, and then goes looking for those things, mostly ignoring the words in-between.
I don’t know what you mean by ‘ontology’. I thought it meant the study of reality.
I can believe that the language in scientific research (especially if you limit the fields) is simplified enough for the sort of thing you describe to work.
In computer science and information science, an ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain.
I don’t know what you mean by ‘ontology’. I thought it meant the study of reality.
“In computer science and information science, an ontology) is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain.”
If you could build an AI that did nothing but parse published articles to answer the question, “Has anyone said X?”, that would be very useful, and very safe. I worked on such a program (SemRep) at NIH. It works pretty well within the domain of medical journal articles.
If it could take one step more, and ask, “Can you find a set of one to four statements that, taken together, imply X?”, that would be a huge advance in capability, with little if any additional risk.
I added that capability to SemRep, but no one has ever used it, and it isn’t accessible through the web interface. (I introduced a switch that makes it dump its output as structured Prolog statements instead of as a flat file; you can then load them into a Prolog interpreter and ask queries, and it will perform Prolog inference.) In fact, I don’t think anyone else is aware that capability exists; my former boss thought it was a waste of time and was angry with me for having spent a day implementing it, and has probably forgotten about it. It needs some refinement to work properly, because a search of, say, 100,000 article abstracts will find many conflicting statements. It needs to pick one of “A / not A” for every A found directly in an article, based on the number of and quality of assertions found in favor of each.
How close to you have to get to natural language to do the search?
I’ve wondered whether a similar system could check legal systems for contradictions—probably a harder problem, but not as hard as full natural language.
Most of the knowledge used, is in its ontology. It doesn’t try to parse sentences with categories like {noun, verb, adverb}; it uses categories like {drug, disease, chemical, gene, surgery, physical therapy}. It doesn’t categorize verbs as {transitive, intransitive, etc.}; it categorizes verbs as eg {increases, decreases, is-a-symptom-of}. When you build a grammar (by hand) out of word categories that are this specific, it makes most NLP problems disappear.
ADDED: It isn’t really a grammar, either—it grabs onto the most-distinctive simple pattern first, which might be the phrase “is present in”, and then says, “Somewhere to the left I’ll probably find a symptom, and somewhere to the right I’ll probably find a disease”, and then goes looking for those things, mostly ignoring the words in-between.
I don’t know what you mean by ‘ontology’. I thought it meant the study of reality.
I can believe that the language in scientific research (especially if you limit the fields) is simplified enough for the sort of thing you describe to work.
See: http://en.wikipedia.org/wiki/Ontology_(information_science)
“In computer science and information science, an ontology) is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain.”