I feel like you’d need to specify for what kind of person these statements shall appear about 50% likely. That can be very different across different knowledge backgrounds. I, as a European, have no idea whether or not Iowa and Ohio are neighboring states.
That said, I think geographical questions might do well because such statements should be easy to generate and find evidence for/against.
Examples:
The Great Slave Lake is the 11th largest lake in the world
Algeria is the 12th largest country in the world
Israel is bigger than New Jersey
Germany is smaller than Montana
Sulawesi is one of the ten largest islands in the world
Writing a script that extracts statements from this type of data should be feasible, and one could write it such that for each true statement extracted, a wrong statement is created as well.
I find it very hard to judge these questions, however given a world map (without borders) this changes. Also, you could tell me how many people live in the countries/states mentioned, how large one of this countries is in absolute numbers or what the greatest depth of the Great Slave Lake and fifteen other lakes in the world is.
Once these statements are available, they could not only be used for calibration training, but also for exercises about seeking the truth in groups.
I would recommend against “X is the 13th largest Y”, because other than people who’ve memorized the Top Twenty Ys getting this right is purely a matter of guesswork. “One of the 10 largest” is better; so is “X is bigger than Y”.
Well, if one can come up with the top Ys, one can reason about what probability one wants to assign to that statement. For example, if I can think of 9 countries that I think are bigger than Algeria, and three of which I am uncertain, I can well assign a probability of, say, 30%. Calibration training could be done this way.
Yeah, I guess, but that’s a whole lot of work for one short question of this kind and if you can think of 12 candidates then there’s a good chance you’ve forgotten a couple. I don’t mean to imply that this kind of question is completely useless, only that other sorts are probably better.
http://dbpedia.org/About
Also there is no need to scrape wikipedia, work has been done for you. You can do sparql queries to get most of your statements and the CEGIS site supposedly has a working sparql endpoint but I haven’t used that in years.
I feel like you’d need to specify for what kind of person these statements shall appear about 50% likely. That can be very different across different knowledge backgrounds. I, as a European, have no idea whether or not Iowa and Ohio are neighboring states.
That said, I think geographical questions might do well because such statements should be easy to generate and find evidence for/against.
Examples:
The Great Slave Lake is the 11th largest lake in the world
Algeria is the 12th largest country in the world
Israel is bigger than New Jersey
Germany is smaller than Montana
Sulawesi is one of the ten largest islands in the world
(some of these are false, some are true).
To create these statements, one could look up wikipedia lists, e.g.List of islands by area, List of countries by area, List of rivers by length and so on.
Writing a script that extracts statements from this type of data should be feasible, and one could write it such that for each true statement extracted, a wrong statement is created as well.
I find it very hard to judge these questions, however given a world map (without borders) this changes. Also, you could tell me how many people live in the countries/states mentioned, how large one of this countries is in absolute numbers or what the greatest depth of the Great Slave Lake and fifteen other lakes in the world is.
Once these statements are available, they could not only be used for calibration training, but also for exercises about seeking the truth in groups.
I would recommend against “X is the 13th largest Y”, because other than people who’ve memorized the Top Twenty Ys getting this right is purely a matter of guesswork. “One of the 10 largest” is better; so is “X is bigger than Y”.
Well, if one can come up with the top Ys, one can reason about what probability one wants to assign to that statement. For example, if I can think of 9 countries that I think are bigger than Algeria, and three of which I am uncertain, I can well assign a probability of, say, 30%. Calibration training could be done this way.
Yeah, I guess, but that’s a whole lot of work for one short question of this kind and if you can think of 12 candidates then there’s a good chance you’ve forgotten a couple. I don’t mean to imply that this kind of question is completely useless, only that other sorts are probably better.
I’d say it depends on what exactly you want to do once you have the statements.
USGS has good info.
http://www.usgs.gov/ http://cegis.usgs.gov/ontology.html
http://dbpedia.org/About Also there is no need to scrape wikipedia, work has been done for you. You can do sparql queries to get most of your statements and the CEGIS site supposedly has a working sparql endpoint but I haven’t used that in years.