Thursday, September 9, 2010

Hire Some Librarians, Please

Over at Salon, Laura Miller sums up some of the search problems plaguing Google Books and interviews Geoffrey Nunberg, who raised the issues last year:

Nunberg, a linguist interested in how word usage changes over time, noticed "endemic" errors in Google Books, especially when it comes to publication dates. A search for books published before 1950 and containing the word "Internet" turned up the unlikely bounty of 527 results. Woody Allen is mentioned in 325 books ostensibly published before he was born.
Other errors include misattributed authors -- Sigmund Freud is listed as a co-author of a book on the Mosaic Web browser and Henry James is credited with writing "Madame Bovary." Even more puzzling are the many subject misclassifications: an edition of "Moby Dick" categorized under "Computers," and "Jane Eyre" as "Antiques and Collectibles" ("Madame Bovary" got that label, too).

It appears that Google, of all places, is having problems understanding the function of metadata, though the root problem may be the outsourcing of scanning and data entry to anyone who wants to do it, no matter how little training.  Note to Google: I love this project, but please hire some librarians instead of outsourcing the work to Armenia.

(h/t The Millions)

