Cites & Insights: Crawford at Large
ISSN 1534-0937
Libraries · Policy · Technology · Media

Selection from Cites & Insights 6, Number 1: January 2006

Followup/Feedback Perspective

OCA and GLP Redux

I was mistaken in dismissing Project Gutenberg as nothing but etext. Although etext—plain ASCII with none of the appurtenances of books—continues to be Michael Hart’s thrust and most prominent in descriptions of Project Gutenberg (including those on the site itself), there’s more to it than that.

Bruce Albrecht sent a careful explanation:

I would like to take exception to the several places in the December 2005 edition of Cites and Insight where you dismiss the Project Gutenberg as merely a library of e-texts as opposed to e-books, which are clearly better.

In the lowest common denominator form, PG texts are, as you say, only etexts. However, many, if not most of the new works contributed to PG these days from Distributed Proofreaders also include a secondary HTML version which include all the features of an e-book that Karen Coyle claims work from PG lack. For example, consider A Study of Pueblo Pottery by Frank Cushing, The HTML edition has a linked table of contents, page markers, illustrations (with links), an index (with links), in short, everything most scholars want and/or need in an e-book. The catch, however, is that not everything produced by DP and the other PG contributors include an HTML edition with these features.

The trend at DP is to require HTML editions, as well as retain the publication information from the original source material. There are also projects at DP to replace early PG editions with new editions with illustrated HTML editions when illustrated sources are available to the DP volunteers.

You make it clear that you consider the typography and layout of your e-journal to be an integral part of it. That is certainly your prerogative. However, in most instances, the author of a book (maybe more so than the editors of a magazine or journal) is really at the mercy of the publishers, and has little or no say in the layout of the published book. Rarely do subsequent editions of a printed book retain the layout (and often the illustrations) of the earlier edition. With the PG HTML editions, selection of the fonts used are at the discretion of the reader, and the HTML is hopefully sufficient to accommodate the selections by the reader, including, for example, large type for the visually impaired.

There are volunteers at PG working on a standard, PG-TEI, based on TEI-LITE, which would provide a master file that would be able to generate multiple formats, including straight ASCII, HTML, PDF, and various mobile reader formats. This format would retain all the information about the original source book except perhaps the typography. There are already a few books in the PG library in this format, and converters to several of the formats listed above.

My only disagreement with this letter is “several” in the first paragraph. I would say that I referred to PG in that manner once, or maybe 1.5 times. Otherwise—well, I was wrong. I find Michael Hart so grating that I’d simply ignored PG, and a quick look at the home page did nothing to uncover the 3,000 or more HTML versions (there are also a few PDFs). There is a note on HTML deep within the FAQ, but it’s certainly not obvious.

Tonya Allen also wrote to inform me of PG’s expanded set of formats, noting of my “etext” claim:

While this was the status quo perhaps five years ago or more, it is not true now. All PG texts these days come in plain ASCII, but most recent additions (last several years) also are also available in 8-bit text and HTML. All versions include chapter headings (and footnotes if in the text); HTML versions naturally provide links from chapter headings, indexes, and footnotes, and include illustrations and figures, as well as “pleasant fonts”; and many HTML versions also include page numbers.

Ms. Allen suggests that I point you to the Project Gutenberg catalog at and adds that major classics are more likely to come from PG’s early years, and be available only in ASCII.

Is it an ebook?

Is a typical Project Gutenberg HTML version a full digital representation of a particular edition? No.

Is it an ebook? In most ways that matter, yes—particularly when it includes pagination sufficient to allow precise citations.

Am I saying etexts are useless? Of course not. They can be particularly useful for data mining and various sorts of text analysis, and there are tools to turn PG’s plain text into a fairly pleasant reading experience (if you find reading from the screen pleasant under any circumstances).

So let’s say that Project Gutenberg includes some unknown number of true digital replicas of books (in PDF or other form), several thousand ebooks (in HTML), and many more thousand etexts.

Is a digital replica in downloadable form better or worse than an HTML or XML-based ebook? Neither. It’s different. It’s better for some purposes, worse for other purposes. A downloadable combination of digital replica (probably in PDF/A form) and XML, with the option to download one or both formats, might be ideal. For all I know, OCA and other projects could result in such combinations.

Open Content Alliance

I missed this in the big essay: By October 31, OCA had added dozens of new members, including libraries such as those at Columbia, Johns Hopkins, Virginia, and Pittsburgh, as well as Smithsonian Institution Libraries and others. As reported by Barbara Quint in Information Today, there’s also some detail on the scanning process. The Scribe system used by the Internet Archive for OCA scanning involves a book cradle with a spine-friendly 90° angle, a glass platen to hold the page flat, manual page turning, and full-color scanning at “about 500 pixels per inch.” Digitized collections are triply replicated in overseas locations as safeguards.

Roy Tennant’s December 15, 2005 Library Journal column discusses OCA (Tennant’s employer, California Digital Library, is a member; my employer, RLG, is a partner). He stresses that digitized files with associated metadata will be available for complete downloading, so you could build your own interfaces, and that the whole process is as open as possible (for example, the agreement between UC and the Internet Archive was made available to the library press “days after the initiative was announced”). OCA “is based on respect for collections.” The column is a fine short summary of OCA, and includes this paragraph:

It’s unclear whether the OCA project will rival the Google Library project in size. Since it is easier for organizations to participate, the OCA will easily have more participants, but the Google project may lead in the number of digitized volumes if it fulfills its promise. Only time will tell. In any case, more digitized content is likely a better thing overall.

Although this is as closely related to GBS as to OCA, I’ll note it here. A number of libraries and consortia have replicated a finding that turns out to be true in many studies: When you study the overlap among real-world groups of libraries, roughly 60% of the holdings are unique (that is, held by only one library within the group). For the MOBIUS union catalog, it’s 63.5%; for CARLI, 63.8%; for OhioLINK, 58.5%; for Prospector, 65%. As reported last issue, for the Google 5 it’s 60%. Conclusion: It would take a lot of libraries to digitize “everything.”

Google Book Search

The November 21, 2005 New York Times has a Katie Hafner article on Sidney Verba (Harvard University Library director) and Google. Verba’s reading of Google Book Search for copyrighted books: “The thing that consoles me is Google’s notion of showing only the snippets, which have everything to do with what’s in the book, but nothing to do with reading the book.” If I read that correctly, he’s saying it’s all about finding, not displacing the books themselves. Pat Schroeder of the AAP is consistent—in an odd manner: “Look, people should be able to search all this stuff, but it should be the author’s choice and not Google’s.” Two points there: AAP speaks on behalf of publishers, not authors, and it seems unlikely that typical book contracts would leave the choice up to the author. More significant is this wholly new concept: that you need the permission of a copyright holder to index a published product.

Verba’s not too worried about displacing libraries: “[W]hat this does is take you to Google, which takes you to the library.” He wasn’t an instant convert to the project: He wanted details and got them. (There’s a little journalistic misstep later, saying that Google “had built its own scanners, which capture the image of the page using optical character recognition technology.” That’s nonsense: The scanner captures the image using scanning technology; the searchable text is prepared using OCR.)

The Ethicist via ACRLog

I was surprised to read on ACRLog that “The Ethicist” on All Things Considered likened Google’s opt-out offer to “a burglar requiring you to list the things you don’t want stolen.” The Ethicist was talking with Tony Sanfilippo, who in a November 28 essay states that the Google Library Project “is being done outside the scope of traditional copyright protection,” dismissing the possibility that fair use applies. Sanfilippo says the project “may irrevocably hurt the production of knowledge in the future” and has this to say about the contract (which returns a digital copy of the library’s scanned books to the library): “Using an unauthorized full copy as a payment is clearly a copyright infringement.” Interesting, given that the libraries—which own copies of the books—would arguably be justified in making their own digital copies. Is it suddenly illegal because the libraries subcontract the actual scanning to a third party?

It turns out that Sanfilippo’s making a different case: His employer, Penn State Press, wants to sell its own digital copies of books to libraries that already own the print copies. If it can’t do that, “many new books won’t get published,” which turns into this clarion cry: “Do we want to chuck the whole commercial model for the production of scholarship?” (That’s an interesting question, but rhetorical overkill given the situation at hand.) And, of course, Sanfilippo uses the term “theft” to describe the situation. (The person posting the All Things Considered entry found it impossible to believe that the University of Michigan would illegally distribute its digital copies, then went on: “What Google might someday do…well, that’s harder to predict.” One would presume that the contract and copyright law would help guide Google’s future plans: A successful corporation seems unlikely to risk near-certain copyright infringement suits with ruinous statutory damages by making the actual pages of in-copyright books available without prior agreement. Unless, of course, Google is suicidal, which seems highly improbable.)

I posted a comment on the ACRLog post offering a different analogy from that offered by The Ethicist: “I’ll make a photocopy of that poster you printed up to sell, borrowing it from someone you sold it to. I’ll index that poster online, telling people where they can buy or see a copy—but I won’t show a significant portion of the poster to anyone.” As I said then, I care about ethics as much as anyone, and darned if I can find an ethical problem with that proposition.

Morris says fair use…and other voices

A surprising voice in favor of GLP being fair use: Sally Morris of ALPSP. Morris says Google agreed with ALPSP and others that “it was absolutely the case that it is not allowed to [digitize in-copyright material from libraries] in Europe.” Fair use isn’t part of European copyright law; “fair dealing” is narrower. So far so good, but Morris went a little further, in a quote which will no doubt endear her to AAP:

The fact Google recognizes they can’t do this without permission in Europe gives us a threshold to work out a way for them to get permission. In America, they have the law on their side. Here, they accept they don’t. [Emphasis added.]

One publishers’ association has gone on record, in the person of its CEO, saying fair use does apply in this situation: Google has the law on their side. Amazing.

An odd commentary appeared November 28 in Times Online: “Help, we’ve been Googled!” by William Rees-Mogg, “non-executive chairman” of Pickering & Chatto. P&C is an “academic publisher” that primarily publishes collected editions of major authors, edited and indexed, sometimes with original material added. In other words, they’re taking public domain text (at least in some cases) and adding value. Now P&C’s “sturdy, early 19th-century business model” is “threatened by a giant 21st-century business model, the omnivorous Google.” You could stop right there and say that many two-century-old business models have required revision or abandonment in the 20th and 21st centuries. But no. After calling Britain’s copyright deposit requirement a “subsidy” by the publisher to the deposit libraries, Rees-Mogg says this, referring to “books that are still in copyright and will remain so for 70 years or more” (albeit books that consist predominantly of public-domain text, which he doesn’t bother to mention):

If Google can scan these books, without the permission of the publisher, and include them in its database, then most libraries will not need to buy them. And if librarians do not buy them, they cannot be published. The whole world of learning will be damaged, and academic publishing will cease to be a viable business.

Set aside the notion that academic publishing as a whole will disappear if P&C has trouble selling edited public domain works and claiming copyright because of the editing and indexing. This statement makes no sense unless Google is displaying the full text of in-copyright books. Never in the essay does Rees-Mogg state the clear, publicly available, flatly stated truth: That no more than three tiny snippets of any in-copyright book will be displayed without prior permission from the publisher. It’s possible that he’s ignorant, but that seems unlikely. More likely, he’s assuming that most newspaper readers won’t be aware of what Google’s actually doing; it’s a pure scare tactic.

Here’s Rees-Mogg’s assertion of the purpose of AAP’s suit: “The purpose of this application is to force Google to charge for viewing a copyright book, and to share the profit.” Interesting. In his closing statement, he says that the very “survival of the book” (not just academic publishing, not just collected editions of the work of dead writers) “depends on” Google “accept[ing] the rights in intellectual property.” Which, of course, it does; thus the snippets. (Peter Suber has a briefer and probably entirely adequate comment on Rees-Mogg’s assertions: “But this is just wrong.”)

Keith Kupferschmid of the Software & Information Industry Association, another hard-line copyright group, wrote a “Viewpoint” in the December 2005 Information Today, “Are authors and publishers getting scroogled?” That’s one of those questions that answers itself. My copy of the article has so much red and marginal scribbles from my first read-through that I hardly know where to begin; my comments would be nearly as long as the article. Go read it yourself (—but read “Google’s side” in a later issue as well. I’ll let it go with Kupferschmid’s judgment as to the results of Google winning on its claim of fair use: “In essence, the rights of writers and publishers would likely cease to exist in the online world.” No hyperbole here!

Susan Crawford (no relation and she is a lawyer) reports briefly on a December 14, 2005 panel talking about GBS; she was a participant. The current argument of publishers is that Google’s Library Project can’t be fair use because it could affect potential markets. That’s a pretty good way to eliminate fair use entirely, since almost anything could be a potential market. Her comment:

The world is sufficiently unpredictable that anything could happen, right? So fair uses that threaten any possible secondary market can’t exist, according to the publishers. In effect, they’d like to use copyright law to protect against network effects and first-mover advantages that they can’t personally monetize.

I very much hope that Google won’t settle this case. We need these issues decided.

Partners and mythbusters

The University of Michigan and Stanford University have both issued recent memos on their relationship with Google. In Michigan’s case, it’s a “Statement on use of digital archives” dated November 14, noting what the library intends to do with the digital copy of its books that it receives back from Google: preserve the copy in a digital archive, a “dark archive” at least initially (that is, not accessible but there for long-term archiving); define use by the nature of the work (respecting copyright); secure the archive for long-term use. It could be used for natural disaster recovery (working with copyright owners), access for the disabled, and possibly computer science research on the aggregate full text. The library will not reduce acquisitions because of the digital archive, use it as an excuse not to replace worn/damaged works, or use it to provide classroom access to in-print works. In other words, Michigan will respect copyright, just as you’d expect. “Merely because the Library possesses a digital copy of a work does not mean it is entitled to, nor will it, ignore the law and distribute it to people who would ordinarily have access to the hard copy.”

Stanford issued “Stanford and Google Book Search statement of support and participation” on December 7, 2005. The memo says why Stanford’s participating in the Library Project (in short, “to provide the world’s information seekers the means to discover content”) and clarifies that for in-copyright books “this project is primarily supportive of the discovery process, not the delivery process.” Google has been scanning works from Stanford since March 2005, starting with federal government collections (inherently public domain). After those are scanned, Stanford will focus its contributions on works published up to 1964 that are believed to be in the public domain (works between 1923 and 1964 for which copyright was not renewed are in the public domain). The memo also makes clear that “Stanford’s uses of any digital works obtained through this project will comply with both the letter and spirit of copyright law.” Stanford expects the files to support preservation, better discovery tools, links to Stanford’s online catalog, and delivery of full-text digital content when such delivery is legal. Stanford does not intend to “violate the legitimate rights of content owners to control the distribution and exploitation of works under copyright.” The memo goes on to discuss litigation against the Google Library Project, expressing the belief that courts will find Google’s project to be fair use. It’s a substantial discussion; a piece of it deserves direct quotation:

Historically, copyright law has allowed the copying of works without permission where there is no harm to the copyright holder and where the end use will benefit society. Here, there could be nothing objectionable under copyright law if Google were able to hire a legion of researchers to cull through every text in the Stanford University Libraries’ shelves to ascertain each work that includes the term “recombinant DNA.” There could be nothing objectionable with those researchers then sharing the results of their efforts and providing bibliographic information about all works in Stanford’s libraries that include this term. Through the application of well engineered digital technologies, Google can simulate that legion of researchers electronically through algorithms that can return results in seconds…

Let’s wrap up this piece of a continuing story (except for a teeny-tiny extra below) with Donna Wentworth’s refreshingly sensible December 5, 2005 at Copyfight, “Copyright mythbusters: Believe it or not, fair use exists.” I frequently disagree with at least some of the people at Copyfight, but I certainly can’t find fault with this post, which I recommend. She’s mostly citing other people’s “mythbusting” (yes, including mine, in brief) and noting “the usual heaping helping of copyright disinformation.” The first two paragraphs:

One of the more frustrating things about debating copyright issues is that copyright mythology sounds a lot more like the truth than the truth. For instance, many people believe that copyright law gives the copyright holder absolute, immutable control over a work, lasting into perpetuity. The truth—that copyright has built-in limits to protect free speech, scholarship, research, and innovation (the “progress of science and useful arts”)—sounds like a lie. Surely all of that stuff is just bleeding-heart liberal, mushy-minded nonsense?

Oh, well, actually—no. Fair use exists, and for very good reasons.

As some continue to seek a middle ground on copyright issues, it’s useful to remember that fundamental copyright law in the U.S. implies the need for balance.

Sivacracy: A risky gamble with Google

There is one more thing, and it turns out to require extended commentary. Siva Vaidhyanathan published a fairly long essay in the Chronicle of Higher Education, also posted on his blog ( “A risky gamble with Google.” While there are elements of the essay that I agree with, and I certainly agree that Google should be treated with caution (as should almost everybody), I find the essay as a whole troubling and unconvincing. Portions of it seem to suggest that private corporations are inherently bad; I may find that disconcerting because I work for one (albeit a nonprofit). Come to think of it, New York University (Siva Vaidhyanathan’s employer) is also a private corporation…

Vaidhyanathan summarizes, “It pains me to declare this: Google’s Library Project is a risky deal for libraries, researchers, academics, and the public in general. However, it’s actually not a bad deal for publishers and authors, despite their protestations.” I agree with the second sentence. As to the first—well, life is a risky deal, and there’s some risk in any arrangement. Is GLP unusually risky and unwise? I don’t believe Vaidhyanathan makes the case.

He says millions of bound books will be digitized from “five major English-language libraries” and goes on to say it will make available “excerpts from works still in copyright.” The first note is an odd one: Less than half of the books in the Google 5 libraries are in English. The second is misleading albeit factual: Yes, the sentence or two that makes up a Google snippet is an excerpt, but most of us would take “excerpt” to mean something more substantial.

After saying he’s “thrilled and dazzled” by the potential of the project, he says:

But, as we all know, we should be careful what we wish for. This particular project, I fear, opens up more problems than it solves. It will certainly fail to live up to its utopian promise. And it dangerously elevates Google’s role and responsibility as the steward—with no accountability—of our information ecosystem. That’s why I, an avowed open-source, open-access advocate, have serious reservations about it.

Depending on what “utopian promise” you believe Google is making, I’m inclined to agree that it may fail to reach that utopia. So what? Let’s say Google gives up after digitizing half of Michigan’s collection and a total of 100,000 books from the other four libraries: How will this harm anyone?

How does GLP “elevate Google’s role and responsibility”? Who makes Google the steward “of our information ecosystem”? Is there no room for complementary projects—such as, say, for example, OCA, the Million Books Initiative? Has Google arranged a deal that requires shutting down the rest of the “information ecosystem”? I find no answers to those questions that turn Google into a threat.

Vaidhyanathan notes correctly that, although Google has become a “ubiquitous brand,” it still handles less than half of Web searching in the U.S. That would seem to be less reason to fear Google as “the steward of our information ecosystem.” But somehow, Google “must continue to convince the world that it is the anti-Microsoft,” a case I’ve never heard Google try to make. Vaidhyanathan offers Google a very backhanded compliment: “The damage Google has done to the world is minimal.” Google “seems to provide users a service at no cost” and “we are led to believe that Google search results are determined by peer review” (that is, PageRank). I’m a bit astonished by the apparent view that the net benefit of Google’s index (and the improvements in Yahoo!, MSN Search, and others brought about by competition) is “minimal damage to the world.”

Then he lets loose after quoting two of the admittedly more extreme statements from Google and its cofounder (you all know the first one, and most of you’ve read Sergey Brin’s “The perfect search engine would be like the mind of God.”)

Both quotations should worry us. Is it really proper for one company—no matter how egalitarian it claims to be—to organize all the world’s information? Who asked it to? Isn’t that the job of universities, libraries, academics, and librarians? Have those institutions and people failed in their mission? Must they outsource everything? Is anyone even watching to see if Google does the job properly?

Now I see why I launched into a torrent of unanswered questions above: It’s catching! My responses to Vaidhyanathan’s questions—well, you can guess. Google neither has nor claims exclusive rights to organize information. As to the third question—should LexisNexis, Dialog, and every other abstracting and indexing company be attacked for not being a university or library? In practice, no, it’s not the job of universities and libraries to “organize all the world’s information”—at least I don’t believe it’s a realistic expectation. “Must they outsource everything?” They’re not. And if you’re one of us Luddites who believes full-text indexing doesn’t replace good cataloging, libraries aren’t “outsourcing” anything to Google. Google has ambitious and (I believe) unreachable goals. That doesn’t automatically turn it into either the devil or the sole organizer of anything or anyone except the Google index.

Vaidhyanathan claims to “examine the Google Library Project in depth” and you’ll have to come to your own conclusions as to whether he does. I’ll point out a few troubling items. He says “you can’t do much good research” if you’re not part of a university community, which is a slap in the face of public libraries and their licensed databases (and, for that matter, some ambitious statewide database licenses). He says “we could solve each of the problems…without Google” if only there was sufficient commitment (read money). He says privacy has been a problem with Google not so much because of the supposed “access to all our search histories” but because people can find out things about other people using Google. You put your “long-lost sappy poems” on the internet and are later outraged because Google indexes them.

He says Michigan “abrogated its responsibility” on patron confidentiality by failing to demand a stronger pledge than is in the contract, a serious charge against a major university library. He goes on about the dangers of privatization and connects this to the cost of full-text databases. “Rapid privatization” simply isn’t involved: Those databases are made up of in-copyright material published by private publishers.

Here’s one where I say, “Perhaps true but so what?”: “The long-term risk of privatization is simple: Companies change and fail. Libraries and universities last.” Well, the second isn’t necessarily true; the first is (frequently) true. He cites the possibility that Google won’t be around a century from now as making it “imperative that stable public institutions take the lead in such an ambitious project.”

I don’t get it. If Google goes under or stops the project halfway through, Michigan (and other participants) have digitized copies of the books they own and still have the print books. Who’s been harmed? And where does Vaidhyanathan believe the money for a university-led digitization project of this scope and speed will come from? After all, Michigan probably does more and faster book scanning than any other university library—and it welcomes the Google project as turning a thousand-year nightmare into a six-year possibility.

There are more what if/what then questions, none of which suggests that any harm is likely. If Google Book Search ceases to exist, it has resulted in lots of reasonably high quality scanning: Not a bad thing. Put simply, “the public” is not going to fund a Google Book Search equivalent, at least not any time in the near future. If it did, it could be wonderfully complementary—there’s a lot of stuff out there.

Vaidhyanathan makes it simple: Google can’t win. “Beware any corporation that pretends to speak for the public interest.” This is in connection with Google’s new lobbyist, part of whose portfolio is to defend the notions of internet neutrality and fair use. I think it’s clear that Google is lobbying for principles that are in its own interest and the public interest, and I fail to see an inherent contradiction in such a notion. I’m sorry if this is offensive, but the U.S. is a mixed economy based on private enterprise. To assert that private enterprise is always and in all cases at odds with the public interest is just as absurd as “What’s good for General Motors is [automatically] good for the USA.”

Then Vaidhyanathan gets to copyright itself. He talks about the “efforts of millions of people to use their own culture as they see fit” and asserts that Google’s plan “further destabilizes the system.” Apparently, actually fighting for fair use is a bad thing, because it could destabilize a system that Vaidhyanathan calls “absurd.” Given his feelings about copyright, it’s interesting that he says opt-in copying “has worked fairly well in the real world.” Really? He then claims that the Google suit has to do with “the norms of the Web (opt out)” versus “the norms of the real world (opt in).”

But that’s nonsense. Google is not claiming that it has the right to make unlimited use of copyright print materials. It is claiming that it has fair use rights to index the full text of copyright materials, necessarily making a digital copy in the process, as long as that digital copy is not made available to anyone other than the original owner of the material. That’s quite a different matter—and I believe Vaidhyanathan knows this to be true.

Vaidhyanathan pushes the badly-decided case and the sensibly-decided Tasini case as examples to give Google pause, and says it comes down to this: Google shouldn’t take the case to court because, if it loses, “the principles of Kelly” (the Arriba Soft case that allowed thumbnail copies of copyright photographs) “are in danger. So are future similar initiatives, whether they come from libraries or the private sector.” Later, “A bad loss in the Google case could blow a massive chilling effect across all sorts of good ideas.”

There it is: Fair use is too precious to actually be defended in court. It’s like one of those designer gowns: Lovely on the rack, but if you wore it you might get it dirty. Of course, if fair use is never defended (we might lose, and that would have a chilling effect) then fair use ceases to exist—which is its own chilling effect. Vaidhyanathan says a bad ruling might “frighten university counsels” giving advice on fair use—but it’s hard to see how people could behave more timidly in this area! As he notes, “university counsels are already skittish enough.” There’s a circularity here…

Vaidhyanathan is concerned that “Google’s power to link files to people will displace the library from our lives.” But GBS as it applies to copyright materials will not link files to people—it will show them what books might be of interest, which they can then pursue in (ahem) libraries. I agree that Google’s indexing power does not “come close to working as a library.” So does Google.

Vaidhyanathan wants “services like that provided by Google Library”—but only if they’re “Library Library” projects. So much for Dialog. Kill off Ebsco Expanded Academic Index. Deep-six all the rest of those evil private indexes: If libraries don’t do it, it should not be done. That seems to be the theme here. And here’s the threnody:

Libraries should not be relinquishing their core duties to private corporations for the sake of expediency. Whichever side wins in court, we as a culture have lost sight of the ways that human beings, archives, indexes, and institutions interact to generate, preserve, revise, and distribute knowledge. We have become obsessed with seeing everything in the universe as “information” to be linked and ranked. We have focused on quantity and convenience at the expense of the richness and serendipity of the full library experience. We are making a tremendous mistake.

Someone not nearly as wise or important as Siva Vaidhyanathan once said “and, not or.” He—OK, I—believed that private and public institutions could and must work together, and recognized that libraries have always worked with private institutions. Google neither demeans nor threatens libraries (unless, of course, librarians say that ‘everything should work just like Google’ and abandon their own principles—which is not Google’s fault). Google supports libraries through word and deed. So do lots of other corporations, to be sure.

“We” have not universally become obsessed with information. “We”—the majority of the public, who use public libraries—have not abandoned the library experience. I dare say many of us have not lost sight of the ways people and institutions interact—and some of us recognize that some of those institutions are and have always been private for-profit institutions. Was the use of Dialog by libraries “a terrible mistake”? Perhaps. If not, can’t we do a little better than this dismissal of private enterprise as inherently dangerous, unworthy, and—let’s be honest here, “evil”?

An ACRLog post on November 29 responds to that final threnody:

Well…I don’t know about that. We haven’t seen our libraries empty out as information goes online. I think libraries are as likely to be discovered as books are by their collections being searchable. Books will remain a viable format for sustained reading and engagement with ideas even if their contents can be found in snippets online.

But when it comes to the core values libraries have surrendered in order to let Google represent them in court—that’s certainly worth thinking about.

It is—but first I’d like clarification as to how Google is representing libraries in court. It’s defending fair use as defined by its own project. That’s not the same thing. Are libraries surrendering core values? I don’t believe so.

I would like to see more transparency in Google’s confidentiality policy as it applies to GBS. For that matter, if I worked for Google, I’d argue that more transparency in all of the Library Project (e.g., making the other four contracts public knowledge if the libraries agree) would serve Google well.

Tom Peters added a charming and elegant footnote in a December 5, 2005 post at the ALA TechSource blog, “Sinners in the hands of an angry search engine.” Accompanied by an illustration of that thundering preacher Jonathan Edwards (1703-1758), he notes the extent to which “deeper, inchoate fears” seem to be lurking. He finds four such fears made manifest in Vaidhyanathan’s essay:

Ø    More problems, fewer solutions. Peters wonders whether that’s a bad thing: “Most developments of this type eventually create more problems… Civilization itself creates more problems than it solves.”

Ø    The innocent bystanders have the most to lose. This is a crisp summary of the “defending fair use could damage fair use” theme.

Ø    Google will kill libraries. As Peters points out (and Vaidhyanathan says along the way), there’s a lot more to the value of a library than the sum of its books. And, to be sure, GBS won’t replace those books, but more likely increase demand for them.

Ø    Google is the Devil in the guise of God. There it is, never directly said by Vaidhyanathan but a theme I also picked up: Beware the corporation that says it does not do evil. Peters’ last note in this theme: “Perhaps Google is a manifestation of humankind’s hubris.”

This post is another of those cases where I say “I wish I could write like that.” Peters ends:

I find it fascinating that the moral and fear-based facts of this project are frequently hinted at in this debate, but rarely openly addressed. This controversy may reveal—in more ways than we care to imagine—who we are, who we think we are, and who we want to become.


Cites & Insights: Crawford at Large, Volume 6, Number 1, Whole Issue 71, ISSN 1534-0937, a journal of libraries, policy, technology and media, is written and produced by Walt Crawford, a senior analyst at RLG.

Cites & Insights is sponsored by YBP Library Services,

Hosting provided by Boise State University Libraries.

Opinions herein may not represent those of RLG, YBP Library Services, or Boise State University Libraries.

Comments should be sent to Comments specifically intended for publication should go to Cites & Insights: Crawford at Large is copyright © 2006 by Walt Crawford: Some rights reserved.

All original material in this work is licensed under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.