IMO what we need is online revision-controlled semantic databases of scientific knowledge which can be edited by all researchers (at least), are freely accessible by everyone and have convenient APIs for data mining.
Interesting, but how would that work out in practice? Should anyone be able to revise, e.g. books? Who should decide in the case of conflict? Should we use majority voting? Weighted majority voting? Should the original author have a special say?
I am not talking about books. Think about something like Wikipedia only semantic (that is, having formal structure allowing for automated queries e.g. a relational database). I imagine the revision process should also be along the lines of Wikipedia i.e. users work out problems among themselves, if they fail community-appointed administrators resolve it.
DBpedia does provide a way to query wikipedia with automatic queries.
For a lot of knowledge it’s however not trivial to structure it in a useful way. Turning an abstract of a scientific paper into computer readable knowledge is not trival and requires a lot of thinking about ontology and the structuring of knowledge.
It’s the same problem you have when writing Anki cards. It’s very hard to formulize real knowledge.
Agreed. However, there is a lot of knowledge which is easy to structure. Examples: minerals, crystals, chemical substances, chemical reactions, biological species, genes, proteins, metabolic pathways, biological cell types, astronomical objects, geographical / geological objects, archaeological findings, demographical data (including historical)...
In practice we need a mixture of structured and unstructured (like regular Wikipedia) information.
There no real reason why proteins and acheological findings should be in the same database.
genes, proteins
Uniprot with both Swissprot and Trembl works well.
minerals, crystals, chemical substances
PubChem is in principle a good way to store information about chemical substances.
I don’t know much about crystals but do we need a database about them, that separate from PubChem?
From what I heard the data quality of PubChem isn’t ideal. But that not a problem easily solved by creating a new database.
I don’t know much about astrophysics, but I would be surprised if those folks have enough money to buy all those telescopes but not enough money to have a good database of astronomical objects.
geographical / geological objects,
OpenStreetMap is open data for geography. Do you think it lacks something?
http://www.obofoundry.org/ also provides nice information. In bioinformatics there are plenty of people who care about organising databases of knowledge that’s easy to structure.
archaeological findings,
I have no idea on that front. It could be that the related academics don’t use computer enough to have a decent database.
demographical data (including historical)
I don’t know of the correct source, but there probably a lot of complicated copyright involved. Different definition of terms are also complicating things. Illegal drug sales got recently added to the [British GDP] (http://marginalrevolution.com/marginalrevolution/2014/02/improving-gdp.html). Having a database that makes it easy to compare numbers won’t be easy.
Great links, thanks! The situation looks much better than I assumed.
I don’t know much about crystals but do we need a database about them, that separate from PubChem?
Probably not separate. However, it seems that PubChem doesn’t store data about crystal structure, unless I’m missing something (I looked at the entries for SiO2 and NaCl)? Also, PubChem doesn’t seem to have lots of data about reactions.
OpenStreetMap is open data for geography. Do you think it lacks something?
For geography it’s probably good, but it doesn’t seem to have much data about geology, unless I’m missing something? The latter would require some sort of a 3D map of the Earth crust.
In general in the last decade a lot of people in the bioinformatics community tried to find solutions to problems in that sphere.
People like Barry Smith did a lot of work on ontology and we know have bioinformatics driven ontology for emotions because they psychologists just don’t work on that level. When it comes to what the psychologists themselves produce they are stuck with utter crap like the DSM-5. The DSM get’s produced by the American Psychological Association.
PubChem is probably reasonble good where it touches areas that bioinformatics is interested in but crystals aren’t in that sphere.
A lot of information about chemicals that’s out there is also intellectual property of big pharma companies who aren’t happen with sharing it in a open fashion. The American Chemical Society fought against PubChem being well funded.
It an interesting pattern. Bioinformatics might work preceisely because it has no huge society of bioinformaticians that can hold back scientific process in the way the association of the chemists and psychologists do.
For geography it’s probably good, but it doesn’t seem to have much data about geology, unless I’m missing something? The latter would require some sort of a 3D map of the Earth crust.
I don’t know exactly, but I think if the data is available it should go somewhere in that project.
IMO what we need is online revision-controlled semantic databases of scientific knowledge which can be edited by all researchers (at least), are freely accessible by everyone and have convenient APIs for data mining.
Interesting, but how would that work out in practice? Should anyone be able to revise, e.g. books? Who should decide in the case of conflict? Should we use majority voting? Weighted majority voting? Should the original author have a special say?
I am not talking about books. Think about something like Wikipedia only semantic (that is, having formal structure allowing for automated queries e.g. a relational database). I imagine the revision process should also be along the lines of Wikipedia i.e. users work out problems among themselves, if they fail community-appointed administrators resolve it.
Like Wikidata and Wikispecies?
Yep, I’d say these are good examples.
DBpedia does provide a way to query wikipedia with automatic queries.
For a lot of knowledge it’s however not trivial to structure it in a useful way. Turning an abstract of a scientific paper into computer readable knowledge is not trival and requires a lot of thinking about ontology and the structuring of knowledge.
It’s the same problem you have when writing Anki cards. It’s very hard to formulize real knowledge.
Agreed. However, there is a lot of knowledge which is easy to structure. Examples: minerals, crystals, chemical substances, chemical reactions, biological species, genes, proteins, metabolic pathways, biological cell types, astronomical objects, geographical / geological objects, archaeological findings, demographical data (including historical)...
In practice we need a mixture of structured and unstructured (like regular Wikipedia) information.
There no real reason why proteins and acheological findings should be in the same database.
Uniprot with both Swissprot and Trembl works well.
PubChem is in principle a good way to store information about chemical substances. I don’t know much about crystals but do we need a database about them, that separate from PubChem?
From what I heard the data quality of PubChem isn’t ideal. But that not a problem easily solved by creating a new database.
http://eol.org/
I don’t know much about astrophysics, but I would be surprised if those folks have enough money to buy all those telescopes but not enough money to have a good database of astronomical objects.
OpenStreetMap is open data for geography. Do you think it lacks something?
I think there are databases for those things http://en.wikipedia.org/wiki/List_of_biological_databases#Metabolic_pathway_and_Protein_Function_databases lists a bunch.
http://www.obofoundry.org/ also provides nice information. In bioinformatics there are plenty of people who care about organising databases of knowledge that’s easy to structure.
I have no idea on that front. It could be that the related academics don’t use computer enough to have a decent database.
I don’t know of the correct source, but there probably a lot of complicated copyright involved. Different definition of terms are also complicating things. Illegal drug sales got recently added to the [British GDP] (http://marginalrevolution.com/marginalrevolution/2014/02/improving-gdp.html). Having a database that makes it easy to compare numbers won’t be easy.
Great links, thanks! The situation looks much better than I assumed.
Probably not separate. However, it seems that PubChem doesn’t store data about crystal structure, unless I’m missing something (I looked at the entries for SiO2 and NaCl)? Also, PubChem doesn’t seem to have lots of data about reactions.
For geography it’s probably good, but it doesn’t seem to have much data about geology, unless I’m missing something? The latter would require some sort of a 3D map of the Earth crust.
In general in the last decade a lot of people in the bioinformatics community tried to find solutions to problems in that sphere.
People like Barry Smith did a lot of work on ontology and we know have bioinformatics driven ontology for emotions because they psychologists just don’t work on that level. When it comes to what the psychologists themselves produce they are stuck with utter crap like the DSM-5. The DSM get’s produced by the American Psychological Association.
PubChem is probably reasonble good where it touches areas that bioinformatics is interested in but crystals aren’t in that sphere.
A lot of information about chemicals that’s out there is also intellectual property of big pharma companies who aren’t happen with sharing it in a open fashion. The American Chemical Society fought against PubChem being well funded.
It an interesting pattern. Bioinformatics might work preceisely because it has no huge society of bioinformaticians that can hold back scientific process in the way the association of the chemists and psychologists do.
I don’t know exactly, but I think if the data is available it should go somewhere in that project.