The “Spot the Fakes” Test

Scott AlexanderMar 17, 2009, 12:52 AM

65 points

James McAuley and Harold Stewart were mid-20th century Australian poets, and they were not happy. After having society ignore their poetry in favor of “experimental” styles they considered fashionable nonsense, they wanted to show everyone what they already knew: the Australian literary world was full of empty poseurs.

They began by selecting random phrases from random books. Then they linked them together into something sort of like poetry. Then they invented the most fashionable possible story: Ern Malley, a loner working a thankless job as an insurance salesman, writing sad poetry in his spare time and hiding it away until his death at an early age. Posing as Malley’s sister, who had recently discovered the hidden collection, they sent the works to Angry Penguins, one of Australia’s top experimental poetry magazines.

You wouldn’t be reading this if the magazine hadn’t rushed a special issue to print in honor of “a poet in the same class as W.H. Auden or Dylan Thomas”.

The hoax was later revealed¹, everyone involved ended up with egg on their faces, and modernism in Australia received a serious blow. But as I am reminded every time I look through a modern poetry anthology, one Ern Malley every fifty years just isn’t enough. I daydream about an alternate dimension where people are genuinely interested in keeping literary criticism honest. In this universe, any would-be literary critic would have to distinguish between ten poems generally recognized as brilliant that he’d never seen before, and ten pieces of nonsense invented on the spot by drunk college students, in order to keep his critic’s license.

Can we refine this test? And could it help Max Muller with his solar deity problem?

In the Malley hoax, McAuley and Steward suspected that a certain school of modernist poetry was without value. Because its supporters were too biased to admit this directly, they submitted a control poem they knew was without value, and found the modernists couldn’t tell the difference. This suggests a powerful technique for determining when something otherwise untestable might be, as Neal Stephenson calls it, bulshytte.

Perhaps Max Muller thinks Hercules is a solar deity. He will write up a argument for this proposition, and submit it for consideration before all the great mythologists of the world. Even if these mythologists want to be unbiased, they will have a difficult time of it: Muller has a prestigious reputation, and they may not have any set conception of what does and doesn’t qualify as a solar deity.

What if, instead of submitting one argument, Muller submitted ten? One sincere argument for why Hercules is a solar deity, and other bogus arguments for why Perseus, Bellerophon, Theseus, et cetera are solar myths (which he has nevertheless constructs to the best of his ability). Then he instructs the mythologists “Please independently determine which of these arguments is true, and which ones I have just come up with by writing ‘X is a solar deity’ as my bottom line and then inventing fake justifications for the fact?” If every mythologist finds the Hercules argument most convincing, then that doesn’t prove anything about Hercules but it at least shows Muller has a strong case. On the other hand, if they’re all convinced by different arguments, or find none of the arguments convincing, or worst of all they all settle on Bellerophon, then Dr. Muller knows his beliefs about Hercules are quite probably wishful thinking.

This method hinges on Dr. Muller’s personal honesty: a dishonest man could simply do a bad job arguing for Theseus and Bellerophon. What if we thought Dr. Muller was dishonest? We might find another mythologist whom independent observers rate as equally persuasive as Dr. Muller, and ask her to come up with the bogus arguments.

The rationalists I know sometimes take a dim view of the humanities as academic disciplines. Part of the problem is the seeming untestability of their conclusions through good, blinded experimental methods. I don’t think most humanities professors are really looking all that hard for such methods. But for those who are, I consider this technique a little better than nothing².

Footnotes

1: The Sokal Affair is another related hoax. Wikipedia’s Sokal Hoax page has some other excellent examples of this sort of test.

2: One more example where this method could prove useful. I remember debating a very smart Christian on the subject of Biblical atrocities. You know, stuff about death by stoning for minor crimes, or God ordering the Israelites to murder women and enslave children—that sort of thing. My friend, who was quite smart, was always able to come up with a superficially plausible excuse, and it was getting on my nerves. But having just read Your Strength as a Rationalist, I knew that being able to explain anything wasn’t always a virtue. I proposed the following experiment: I’d give my friend ten atrocities commanded by random Bronze Age kings generally agreed by historical consensus to be jerks, and ten commanded by God in the Bible. His job would be to determine which ten, for whatever reason, really weren’t all that bad. If he identified the ten Bible passages, that would be strong evidence that Biblical commandments only seemed atrocious when misunderstood. But if he couldn’t tell the difference between God and Ashurbanipal, that would prove God wasn’t really that great. To my disgust, my friend knew his Bible so well that I couldn’t find any atrocities he wasn’t already familiar with. So much for that technique. I offer it to anyone who debates theists with less comprehensive knowledge of Scripture.

What links here?

Scott AlexanderMar 17, 2009, 12:52 AM

65 points

18 comments3 min readLW link Archive

Replication Crisis

jhuffman Mar 17, 2009, 8:34 PM
15 points

I think the “random words and phrases” I keep seeing in these comments is a bit of an exaggeration. Reading the (completely undocumented) wikipedia article I get the understanding that they crafted these poems using their own previous work, original ideas as well as phrases clipped wholesale from a book of quotations and deliberately convoluted rhymes from a rhyming dictionary etc. Nonetheless they strung them together with some sense of purpose—the selection was not technically random.

If you read an example from that article you will see that it has some continuity—its not the gibberish you would get from having a computer program randomly selecting phrases. So rather what you have is poetry written by poets using an unconventional method for an unconventional purpose. Its not surprising there are those who found this interesting—but I know absolutely nothing of poetry.
- Vladimir_Gritsenko Mar 21, 2009, 1:26 PM
  10 points
  Parent
  
  Curiously, a similar argument was applied to Sokal’s hoax. It, too, is not random gibberish, and it is not surprising at all that the editors of Social Text found it interesting. But does it carry actual value? Going by Weinberg’s analysis, it has quite a few deliberate physics mistakes that could have been spotted by an undergraduate.
  
  I have no idea how poetry buffs go about spotting obvious mistakes in poetry, but if semi-random stuff repeatedly get accepted as genuine (Wikipedia has a bunch of links under the Literary Hoaxes category), the field in trouble.
sjs Mar 17, 2009, 7:55 PM
15 points

My favorite example of this kind of phenomenon is water quality. Everyone and their mother claims they can taste the difference between tap water, tap water filtered through a Brita, and bottled water (and some people are even dumb enough to say that they prefer one brand of bottled water over another). But give them a blind taste test (very easy to perform at home – I encourage you all to do it), and nobody even tries to discern the difference – they usually admit immediately that they taste no difference between the waters.
- MichaelHoward Mar 17, 2009, 8:23 PM
  26 points
  Parent
  
  I just performed this experiment, agreeing with the general point but still moderatly confident I’d be able to tell the difference.
  
  I couldn’t.
  - Sinal Oct 18, 2015, 7:58 PM
    6 points
    0
    Parent
    
    I just performed this experiment assuming I wouldn’t be able to discern tap water from water cooler water, but I was able to—my tap water is slightly more metallic tasting.
bentarm Mar 17, 2009, 1:32 AM
12 points

This seems to be almost the inverse of the My Favourite Liar technique (I initially thought this post was going to be about the suggestion given in this comment that respected posters on this site adopt that technique as a test of the group’s rationality). The issue here, of course, is that you would intersperse some “plausible lies” into your lectures and no-one would be able to tell the difference.

This brings together so many of the ideas from Eliezer’s rationality series that I’m struggling to decide which post to link to (probably this one is most relevant).

If your justifications for your beliefs are indistinguishable from fake justifications for false beliefs, they are worthless to a truth-seeker. Any body of knowledge which can’t pass the “Spot the Fakes” Test is no knowledge at all.

(I can’t resist linking to this in this context)
Annoyance Mar 17, 2009, 7:43 PM
9 points

Standards of quality can be taught. More importantly, the criteria for quality can be so narrowed that eventually they become specific recognition rules for a particular type of input.

For example: I don’t doubt that there are aesthetic and sensory properties of wines that render some more pleasant than others, and certainly there may be general principles held by many people that make some types and varieties of wine better than others. Individual taste may vary, but some things may honestly taste better.

But with quite a few wines, the definition of a quality variety has (I suspect) ceased having anything to do with whether it’s tasty in the reviewer’s mind, and has more to do with meeting learned criteria for that variety. Eventually the standards became so narrow that any deviation away from a narrow state means it’s an “inferior” wine.

Similar things may have happened to classical music. If people are sensitive to even tiny variations, and evaluations of quality depend on hitting a very narrow target, experimentation becomes so expensive that it’s not worthwhile.

The early days of jazz are almost the perfect antithesis of this process.
- dclayh Mar 25, 2009, 9:16 PM
  4 points
  Parent
  
  There was a study I won’t bother to look up now which showed that while wine experts could discriminate between cheap and expensive wines, and got much more enjoyment from the expensive ones (or at least claimed to), people who were new to wine reported no differences between the groups in either objective quality or subjective enjoyment.
Kaj_Sotala Mar 17, 2009, 12:26 PM
8 points

I find the Ern Malley episode a bit puzzling. Yes, if we’re talking about, say the origin of a religious story, then it’s possible to reveal the theory as false. But there isn’t such a thing as “true” or “false” in prose. Even if McAuley and Stewart purposely wrote what they considered to be bad poems, what does their intent matter if they produced poems others thought were good? If I like a book, my liking of it doesn’t become “wrong” or “mistaken” if it’s revealed that the author was actually trying to write a bad book, anymore than me disliking a book becomes mistaken simply because the author was trying to write a good book.
- Scott Alexander Mar 17, 2009, 12:38 PM
  18 points
  Parent
  
  I think modernist poetry is invested in the belief that there is something special about excellent modernist poems; that is, their structure produces emotions or enlightenment not found in random series of words, and modernist poets deserve high status because they can create structures with this emotion or enlightenment.
  
  A competing hypothesis is that the arrangement of words in modernist poetry has no particular value at all, and that people who claim it has value are only doing so to gain status within the modernist poet community.
  
  I think the important part of the Malley experiment was that the two hoaxers created their poems from randomly chosen phrases taken by opening books to random pages. If there’s no way to distinguish a random collection of words from a great modernist poem, then people who can create “great modernist poems” aren’t special and don’t deserve high status, and it supports the hypothesis that people who claim to have been moved by modernist poetry are faking it to look highbrow.
  
  But in reference to your point, I remember reading Lovecraft’s poem “Nathicana” and being very impressed by it. I was astonished to discover a few years later that he wrote it as a parody of people who stick too much emotion into their poetry. I was only slightly mollified to learn I wasn’t the only person who liked it and that it was often held up as an example of how a deliberately bad poem can sometimes be pretty good.
  - Demosthenes Mar 17, 2009, 5:30 PM
    10 points
    Parent
    
    Setting up this sort of experiment, especially in regard to poetry or other humanities topics, seems to be the overwhelming barrier.
    
    We can take at face value that “Malley’s” poems were created from phrases of a limited length selected at random (whatever that really means in this case) and then arranged in a random manner.
    
    This setup would allow us to say that some modernist critics cannot distinguish a modernist poem written by a single person (although with possible allusions and cribbings) from one constructed with phrases less than a specified length from a specific pool of literature.
    
    From what I have found on the affair, it is hard to see if there was much experimental design at all (a criticism that Sokal can share in):
    
    “So, in a series of mischievous creative fugues, they gleaned lines from here and there, even from the American Armed Forces guide to mosquito infestation, and put it together in what they perceived to be a brilliant imitation of the new poetic genre. They dubbed the poet Ern Malley and to avoid the publishers seeking contact with him, they said that, like Keats, he had died young. They then invented his sister, Ethel, who “discovered” the poetry and decided to send it to Harris to judge it for literary merit.” -(http://www.ernmalley.com/text_only.html)
    
    In this specific case, we are stuck with two people who seemed to intentionally create a spoof of modernist poetry which is not a terrible representation of the genre. For a progressive journal to publishing something that was designed to make a strong attempt at passing as modernist poetry using the new technique of collage seems completely appropriate.
    
    Does this seem like an adequate control poem for an experiment of this sort:
    
    ====
    
    Night Piece
    
    The swung torch scatters seeds
    In the umbelliferous dark
    And a frog makes guttural comment
    On the naked and trespassing
    Nymph of the lake.
    
    The symbols were evident,
    Though on park-gates
    The iron birds looked disapproval
    With rusty invidious beaks.
    
    Among the water-lilies
    A splash—white foam in the dark!
    And you lay sobbing then
    Upon my trembling intuitive arm
    
    I suspect that a randomly generated poem from a large amount of source material would look significantly different. I tried out some google poem generators (which are probably not acceptable for this sort of experiment either), and the results weren’t as nice http://shawnrider.com/google/index.php?query=modernism&Submit=generate+poem
    
    In the end, problems with authorship and creation by collage are two of the widely recognized features of modernist poetry http://en.wikipedia.org/wiki/Modernist_poetry_in_English. The hoax seems to prove that some of modernist poetry’s techniques are indeed effective.
    
    I think your point about intentionally created spoofs like Nathicana coming out as good poetry drives home the point that these sorts of parodies aren’t necessarily a good example of control poem construction.
    
    Making these sorts of critiques brings in the distinction between being rational vs rationalizing http://www.overcomingbias.com/2007/09/rationalization.html. If you already have a point you want to prove and proceed to construct a method whereby you’ll prove it, it isn’t truly rational. If you spend a long time working on experimental design and becoming curious about how these methods (structural analysis of myths or modernist poetry) succeed/fail vs a random smattering of words and ideas, then you can build some rational knowledge on the matter.
    
    While I like the idea of the spot the fakes test, I think it would be difficult to come up with good examples where the experimental design really leads to interesting conclusions with the scope of the project.
- steven0461 Mar 17, 2009, 2:32 PM
  8 points
  Parent
  
  If a typical randomly-generated work of art is good, then most possible works of art are good, which means you’re setting the bar for “good” (or “liked by me”) too low.
- Nebu Mar 17, 2009, 3:15 PM
  6 points
  Parent
  
  Usually in art-criticism, the critics aren’t really concerned with whether any one particular person likes or dislikes the particular work of art. Instead, they are trying for less value-oriented analysis, such as what themes might be present, what techniques were used, etc.
  
  So probably the embarrassment from the hoax was due to the “fake-critics” claiming that surely one particular passage was a truly inspired example of symbolism of the dichotomy present in male-female relationships or whatever, only for the hoaxers to reveal that there was no symbolism at all, as the passage was randomly generated.
MBlume Mar 17, 2009, 9:28 AM
8 points

Dawkins’ enjoyably incisive description of the Sokal affair
NoisyEmpire Apr 26, 2013, 1:21 AM
6 points

This is a really excellent technique in a lot of contexts.

I offer a word of caution about actually using it with theists, even those less Biblically literate than Yvain’s friend: the catch-all excuse that many (not all) theists make for Biblical atrocities is precisely that they were commanded by God, and thus on some version of Divine Command Theory are rendered okay—not that the atrocities are in some observable way actually less bad than those committed by other groups or religions.
cleonid Mar 18, 2009, 1:43 AM
3 points

The assumption that a work of art has an independent value, which is linked to its enjoyment by the consumer, is out of date. In the modern world the art has also an important social function. It separates the cultural elite, who find the art’s “colors and patterns exceptionally beautiful”, from the philistines, “who are unfit for their office or unpardonably stupid”.
taw Mar 17, 2009, 12:41 PM
0 points

A software request. Can we get #-links to footnotes? I wanted to tweet footnote 2, but it doesn’t have any anchor. Or put it into a separate post, as it’s awesome.
- Scott Alexander Mar 17, 2009, 3:24 PM
  3 points
  Parent
  
  It was already possible on the software, I was just too lazy to do it. Added anchors footnote1 and footnote2.