Monday, December 20, 2010

getting started with Ngrams

Ben Schmidt writes the smartest thing I’ve yet seen about Google’s Ngram project:

But for now: it's disconnected from the texts. This severely compromises its usefulness in most humanities applications. I can't track evolutionary language in any subset of books or any sentence/paragraph context; a literary scholar can't separate out pulp fiction from literary presses, much less Henry James from Mark Twain. It was created by linguists, and treats texts fundamentally syntactically--as bags of words linked only by very short-term connections--two or three words. The wider network of connections that happen in texts is missing.

Don't doubt that it's coming, though. My fear right now is that all of the work is proceeding without the expertise that humanists have developed in understanding how to carefully assess our cultural heritage. The current study casually tosses out pronouncements about the changing nature of 'fame' in 'culture' without, at a first skim, at least, acknowledging any gap at all between print culture and the Zeitgeist. I know I've done the same thing sometimes, but I'm trying to be aware of it, at least. An article in Science promising the "Quantitative Analysis of Culture" is several bridges too far.

So is it possible to a) convince humanists they have something to gain by joining these projects; b) convincing the projects that they're better off starting within conversations, not treating this as an opportunity to reboot the entire study of culture? I think so. It's already happening, and the CHNM–Google collaboration is a good chance. I think most scholars see the opportunities in this sort of work as clearly as they see the problems, and this can be a good spur to talk about just what we want to get out of all the new forms of reading coming down the pike. So let's get started.

Yes, yes, yes. Let the traditional humanists stop sneering; let those on the digital frontier shun the language of “reinvention” and avoid suggesting that they have rendered other approaches to the humanities obsolete. (There aren't many that arrogant, but there are a few.) Also, this project does not create a new field. Let’s get started is just the right note to strike.

(The article from Science that Schmidt mentions is here.)

UPDATE: If you want to get some thoughts from someone who, unlike yours truly, actually knows what he's talking about, check out Dan Cohen.


