Text Patterns - by Alan Jacobs

Wednesday, February 23, 2011

I want to believe

Returning to the subject of today’s earlier post: The authors of that study write this in summation:

Statistical findings, said Heuser, made us realize that genres are icebergs: with a visible portion floating above the water, and a much larger part hidden below, and extending to unknown depths. Realizing that these depths exist; that they can be systematically explored; and that they may lead to a multi-dimensional reconceptualization of genre: such, we think, are solid findings of our research.

Nothing this vague counts as “solid findings.” What does it mean to say that a genre is like an iceberg? What are those “parts” that are below the surface? What sorts of actions would count as “exploring those depths”? What would be the difference between “systematically” exploring those depths and doing so non-systematically? What would a “reconceptualization” of genre look like? Would that be different than a mere adjustment in our generic definitions? What would be the difference between a “multi-dimensional reconceptualization of genre” and a unidimensional one?

The rhetoric here is very inflated, but if there is substance to the ideas I cannot see it. I would like to be able to see it. Like Agent Mulder, I want to believe — but these guys aren't making it easy for me.


  • This is mostly a rephrasing of what Brandon already said in response to your earlier post. (I still have to read the paper but this is from what I understand from your post.)

    (1) We understand intuitively what "genre" means but let's say we wanted a formal list of features of what constitutes the genre. Now, to be sure, a genre cannot be reduced to a list of its features but we could offer such a list without a computer program. What a computer program can do is to offer up features that we wouldn't ordinarily have thought of, because of the computer's ability to crunch numbers.

    (2) I think what computer programs can do as well is to offer up a list of what I call "not-features." Meaning that it can tell us what a genre doesn't have. Possibly.

    Now not all studies can accomplish this - and maybe most studies will only start to tell us what we already intuitively know (although it's nice to have intuitive knowledge verified) - but I can certainly see a value to this kind of analysis.

    But yes, if the point is to ask whether these kind of analysis can help us understand a text better, I don't know. But if we think of them as tools to understand certain regularities that exist in texts -- regularities that are invisible to us -- then yes, they have value.

  • scritic, can you tell me what you mean by "features," especially features that we can't see but programs we create can?

  • That's a good question. My knowledge of literary genres is passable, at best, but here are some (undoubtedly trivial and hypothetical) things I can think of.

    (1) When you construct a frequency plot of the parts of speech that occur in a text (nouns, verbs, etc.), each genre turns out to have a specific and uniquely shaped distribution.

    (2) Or romances have a lot of sentences that have the structure noun-verb-noun-verb-abstract noun. (Or something.)

    I can't think of any more right now, but you get the idea.

    I suspect though what you're asking is that if we construct programs to look for these kinds of regularities, the programmer (or researcher) is already looking for these kinds of regularities and the program only finds it for him (or verifies it for him that it exists).

    This is true although lots of advances have been made in pattern recognition and machine learning. For instance, there is something called "boosting" that lets us construct weak pattern recognizers and then build them up together into a very strong one (and the strong classifier is more than a sum of its weak parts.)

    Another example of this kind of "pattern recognition" in data is what some physicists are trying to do for cities. Although, even here you could argue that a mathematical relationship between the different parameters of urban spaces makes more sense than one between the parameters of texts. At least, one can trace the mathematical relationships for cities to certain causal factors, but no such causal factors come to mind for texts.

    And I'll admit I have a hard time trying to imagine how statistical regularities can help us interpret texts better, but I definitely think that the regularities we can find with computation will go far beyond what we could find without the programs.

  • I think I posted something but maybe it got caught in your spam filter?

  • Thanks for the comment, scritic, and sorry about the absurdities of the spam filter. what you say about boosters is very helpful — something for me to think with.

  • This is *exactly* the way Tolkien's books are described on the making of DVDs that come with the big fat boxed set. The novels were the "tips of the ice berg" on Tolken's world building.

Post a Comment

[Basic HTML tags can be used in this comment field.]