Now that I’ve tagged more than half of the evidence quotations in OED2 for genre [see here and here for a discussion of this process], it’s time to start having a poke around in the data. A question that occurred to me last night was: ‘is there anything interesting about the relative density of poetic vs. other kinds of quotations in OED’s entries?’ So this morning I got the computer to count the number of total quotations for each entry and the percentage of the tagged quotations coming from poetic sources (poetry and verse drama), and then look up the frequency of the headword in COCA. Here’s the top few results, by poetic percentage (of the tagged quotations):
|a hall, phr.||2||100%||0|
So that’s no good. But what if we limit the list to words with a good number of evidence quotations, say 100 or more? We get something with a little more bite:
Or, should I say, something that grabs our attention, since ‘fang, v.’ is an old word meaning ‘To lay hold of, grasp, hold, seize; to clasp, embrace.’ It’s labelled arch. or dial. in OED2, but survives in ‘newfangle’, ‘newfangled’, fairly unnewfangled words, having been in use before Chaucer was born (and used by him, after).
But back to the list. In addition to observing the seeming predominance of negative affect words, it’s apparent that many of the words are, like ‘fang, v.’, quite old, which may account for some of the bias toward poetic evidence in OED2, since the farther back you go, more and more of the available evidence is from poetry (there were no novels or newspapers to illustrate 10thC uses of ‘strew’).
But even though all the words in the list are first recorded before 1350, and all but ‘rage, sb.’ come out of Old English, several continue to be fairly common and not generally regarded as particularly ‘poetic’.
So another way to filter the list would be to add a restriction on the COCA frequency of the word, to remove archaisms and other very rare words. Note that my COCA count doesn’t match for parts of speech (332 is certainly the count for the noun ‘fang’). Excluding the above results, here are the next few that have COCA frequencies above 900:
I’m surprised by some of this. The proportion of poetic quotations for ‘up’, ‘must’, and ‘us’ (we can ignore ‘May’ – the COCA count is for another part of speech) is almost three times the average for all quotations in OED2, and almost seven times the average for all OED2 headwords.
Other observations are that while negative affect words continue to show up (sorrow, weary), there are more positive ones (rosy, dear, soft). And we’re starting to get a few more Latinate words (rosy, subtle, urge) as well.
So, what is the most poetic word in English, according to the OED’s compilers? On numbers alone, ‘worse’ is the clear winner, with ‘fang, v.’ disqualified for archaism. But surely at some point the COCA count must work against a word in this kind of contest – it can’t be ‘up’, surely.
For that reason, I’ve decided ‘rage, sb.’ must take first place. So I was pleased to find under that headword a definition I didn’t know: ‘8. Poetic or prophetic enthusiasm or inspiration; musical excitement.’ And, under another sub-sense, this line from Tennyson: ‘The captive void of noble rage.’
Here’s a plot of all 220,000 entries, with Quote Count on the Y axis and %Poetry on the X: