Cites & Insights: Crawford at Large
ISSN 1534-0937
Libraries · Policy · Technology · Media

Selection from Cites & Insights 5, Number 14: December 2005


OCA and GLP 1:
Ebooks, Etext, Libraries and the Commons


1.    How many books has Project Gutenberg digitized and made available online?

2.    Is the e-journal Cites & Insights available in HTML form?

3.    Will the Online Content Alliance make ebooks freely available?

4.    Will the Google Library Project (GLP) make ebooks freely available?

5.    Is the in-copyright portion of GLP fair use?

6.    Does GLP harm book sales?

7.    Does GLP harm authors?

8.    Will OCA and GLP replace online catalogs?

9.    Will OCA and GLP weaken libraries?

10.  Will OCA and GLP strengthen the commons?

11.  Should librarians struggle to assure that OCA, GLP, and related efforts don’t overlap?

12.  Should (do) people read books from beginning to end?


1.    None.

2.    No.

3.    Probably.

4.    Sort of—but only with a forgiving definition of “ebooks.”

5.    Nobody knows—and nobody knows whether a court trial on this issue will be a very good or a very bad thing.

6.    No.

7.    No.

8.    No.

9.    No.

10.  Yes.

11.  No.

12.  Yes and no.

This Perspective is just that—my perspectives (and what I’ve gleaned from others) on what’s going on with OCA and the Google Library Project (GLP) and implications for copyright, ebooks and etext, libraries and the commons. A separate essay considers some of what’s been published (primarily on the web) about the two projects.

Expanding on Those Answers

1.    Project Gutenberg does not digitize books and make them available. It digitizes the texts and makes those available online (and in CD/DVD collections). That’s an important distinction, although it’s one that gets confused when you talk about “ebooks.” In Project Gutenberg’s case, it’s clear enough. The plain-ASCII files omit all typography and design, sometimes omitting even chapter headings. That’s great in terms of low-bandwidth downloads and full-text manipulation, but it deals with e-text: The text of a book. (A true booklover would say that even making digitized pages available isn’t making the book available, since the quality of paper and the binding are also involved. I’m taking a middle ground.)

2.    The text of most essays in Cites & Insights since 2004, including all essays in recent issues, is available in HTML form. But Cites & Insights isn’t just text—it’s also a deliberately designed print-oriented publication, using carefully chosen typefaces and typographic devices. The HTML essays include some of the typographic devices (titles, headings, block-indented quoted material, bullets, italicized and boldfaced text) but omit much of the design of the ejournal itself. In other words, “text in Berkeley Oldstyle Book set at 11 points on 13 point leading” is an integral part of what defines the ejournal—but not the text within the essays.

3.    There are so many definitions of “ebook” that no definitive answer is possible here. OCA does plan to provide digital facsimiles of book pages, which taken together constitute one definition of an ebook (not just the etext for a book). That’s why PDF will be at least one standard form of OCA availability: It’s one way to preserve the design of a printed book. Offered as coherent downloads, I’d call OCA’s offerings ebooks. It’s also likely to offer PoD.

4.    GLP allows on-screen reading of digital replicas of book pages, but does not allow coherent downloading of complete books. It also doesn’t allow bookmarking (as far as I can tell)—if you read pages 1-30 of a GLP “book” in one session, you’ll have to go through those pages again in the next session to get to page 31. It takes a broad definition of “ebook” to include what GLP provides—but that could change. It’s more booklike than Project Gutenberg (in the sense that typographic integrity is maintained), but it’s less “ebookish” since you can’t download the book or mark your place. Karen Coyle has suggested that GLP is “creating a lot of automated concordances to print books,” and that’s partly true—except that the concordances are bundled into one huge metaconcordance, and for copyright books GLP only shows the first three occurrences of a word or word combination, unlike a proper concordance.

5.    In my opinion, it should be—even though I’ve also said in the past that it probably isn’t. Not because Google will be “making in-copyright books available online”—the project is quite clear about not doing that, and I can’t for the life of me turn three paragraphs of a book into a portion that would violate any definition of fair use. The problem is the complete cache that lies behind the full-text indexing and provision of those three snippets: That’s a copy by most current definitions and some authors and publishers claim it’s copyright infringement. I’d like to believe that I’m wrong in my earlier opinion, and lots of people who know more about copyright than I do seem convinced that it is fair use. The problem with a court trial is that it could either expand the explicit realm of fair use (ideally shifting owner’s control toward digital distribution, eliminating cached copies as potential infringements), or it could help undermine digital fair use by finding for the publishers and authors. On balance, I hope the court case goes forward—but I’ll be surprised if it does.

6.    GLP will not make in-copyright books available for free, and as currently described won’t make it easy to read most public-domain books for free. By encouraging discovery for relatively obscure works, Google Print should increase book sales, giving a little more visibility to non-bestsellers (the “long tail” if you need Wired-inspired jargon for longstanding phenomena).

7.    How could it harm authors to make their works more visible? Well, OK, it might harm some authors—those whose writing or thinking is so bad that three paragraphs turn off potential buyers and those whose works are clearly inferior to lesser-known books that GLP makes visible. The claim that GLP hurts authors or publishers because it deprives them of some theoretical market for making their books full-text indexed online or leasing the books so someone else can do it is, I believe, implausible.

8.    I believe that the visibility of the first chunk of Google Book Search is starting to clarify this situation. Full-text searching of book-length text just isn’t the same as good cataloging, quite apart from the fact that OCA and Google Book Search won’t usually provide instant access to local availability or combine circulation with cataloging data. Not that full-text book searching isn’t valuable; it is, but its role is complementary to that of online catalogs. The projects might hasten the improvement of bad OPACs; that’s not a bad thing.

9.    I believe OCA and Google Book Search (formerly Google Print) will both strengthen libraries by making works more visible, particularly with links to library catalogs and metacatalogs for local holdings. Even with OCA’s full-download capabilities, most users are likely to prefer a print copy for those texts that they wish to read at length. Forward-looking libraries will be working to provide links between OCA, Google Book Search and their own services; some already are.

10.  OCA should definitely strengthen the commons by making substantial quantities of public-domain material available—and, as currently planned, by helping to define the public domain itself by identifying post-1923 books with lapsed copyright. As for GLP, it really depends on how the project progresses and the extent to which Google decides to cooperate and interoperate with OCA, Project Gutenberg, and other digitization and etext projects. At the very least, GLP will make pages from public domain works available, which strengthens the commons (although not as much as the open approach of OCA).

11.  Chances are GLP will digitize the same “book” (that is, same edition of a given title) more than once if it succeeds in its overall plan. Since OCA isn’t one digitizing plan but an umbrella for a range of related initiatives, it’s even more likely that the same edition will be scanned more than once, particularly when you combine OCA, GLP and other projects. If the digitization really is non-destructive, fast, and cheap, that may not matter. The costs (in time and money) of attempting to coordinate all such projects in order to prevent redundant scanning may be higher than the costs of redundant scanning and storage. As for semi-redundant scanning—that is, scanning more than one edition of a title or more than one manifestation of a work—it’s not at all clear that avoiding such semi-redundancy is desirable, even if feasible. Lightweight methods aren’t necessarily the most desirable for every project; for a loose network of low-cost book digitization projects, however, keeping the bureaucratic overhead light may be essential.

12.  Yes: The vast majority of fictional works are, I believe, read through—and that’s certainly how they’re intended to be read. Yes: A high percentage of narrative nonfiction books, including both scholarly monographs and more popular works, are designed to build a case and are best suited to through-reading—and, I’ll suggest, are read through from beginning to end by most readers in most circumstances. No: Lots of books aren’t designed for through-reading, and in many cases a reader can effectively use a portion of a book that is designed for through-reading while ignoring the rest. (Alane at It’s all good assaults Michael Gorman’s statement, “The point of a scholarly text is that they are written to be read sequentially from beginning to end, making an argument and engaging you in dialogue,” in a November 16 post, calling it “arrant nonsense” and citing a 1985 survey as evidence to the contrary. While I agree that Gorman overgeneralized—some scholarly texts are written to be read through—the headline on the post also overgeneralizes: “How people use books.” I believe that 60% of a sample group consisting of 69% hard scientists responded to a question about how “you use a volume from the library these days” by saying they read 10% or less. Since “volume” and “scholarly monograph” aren’t at all the same thing, and since hard scientists have largely abandoned monographs for journal articles (I believe), this finding has little to do with Gorman’s assertion. In any case, how texts are intended to be read and how they are used aren’t necessarily the same thing. “Real people interact with real texts” in many ways—almost any generalization is likely to be false, certainly including Gorman’s.)

The Ebook-Etext Confusion

I can hear voices already: “Why should an ebook maintain the typography and pagination of a print book? Why shouldn’t it be a different experience?”

It is a bit presumptuous of me to define “ebook” so as to exclude booklength etext with none of the attributes that turn a text into a book. After all, I split “ebooks” into nine models five years ago (“Nine models, one name: Untangling the e-book muddle,” American Libraries 31:8 (September 2000): 56-59): proprietary ebook devices, open ebooks, public-domain ebooks, circulating pseudobooks, “digital to physical” (PoD), “not quite a book” (brief etexts such as Stephen King’s Riding the bullet), e-vanity/self-publishing, ebooks before the web, and “extended books” (systems that provide extensions beyond book capabilities). If anything, the situation has become more confused since then.

Here’s how Karen Coyle put it:

It’s debatable whether you can call [Project Gutenberg]’s offerings “e-books.” They are definitely e-texts, but they lack nearly all of the qualities that you would desire n a book, which is why use of their texts has not been as stunning as the hype around PG. PG texts lack:

-paging and page numbers

-any ability to navigate by page or chapter, or link from indexes, etc.

-the look and feel of a book, i.e. pleasant fonts

-the ability to have illustrations and figures

-the ability to have footnotes.

And, as she says, pagination does matter if you plan to cite passages, use a book for classroom discussion, etc. Given all that, I believe it’s reasonable to call plain-ASCII transcriptions of book-length materials “etexts” rather than “ebooks.”

Sure, an ebook can and in some cases should be a different experience from a print book, but that experience should still involve elements that make a book something more than text. Breaking away from the print paradigm requires thought, not just transcription. This isn’t to call PG texts useless, but they’re something short of ebooks.

So, I believe, are GLP public domain offerings, at least as currently planned. They’re closer in some ways, but further away in others. Again, that could change—if Google decides that it’s in Google’s interest to provide a mechanism for bundling the set of page images that makes up a book and downloading it to a device that can treat it as a coherent ebook. Again, this doesn’t make GLP and Google Book Search useless or even less than potentially spectacular—but it also doesn’t necessarily make them into ebook factories. Nor is that what they’re intended to be, if you believe Google itself: They’re ways to find books more than they are ways to read books online. (Thus the name change—between the time these essays were first written and the time they appeared!)

E-ink and E-paper

Recent Cites & Insights pieces have featured discussion of e-ink and e-paper, and my doubts about both. Perhaps I should clarify my feelings in the context of ebooks and libraries.

I want to see an e-ink/e-paper that works for something other than yet more ways to sell stuff: Something that would allow a print-like reading experience and avoid some of the pitfalls of dedicated ebook devices and reading ebooks on computers. I want to see that for at least three reasons:

Ø    Many “ebooks” serve their purposes better than print equivalents. Setting aside archival issues, it makes more sense to offer such things—fast-changing reference works, volumes of material where only a few pages needs to be read at any time, textbooks (in most cases), and more—as ebooks with the readability of print books. That should be a multibillion-dollar market, if it’s handled right, and could serve to replace print where print performs worst.

Ø    Some readers have good reasons to prefer some form of digital reading device over current magazines and books, for example those who need capabilities that good epaper devices might provide (expandable type, for example). Some others may be so dedicated to all things digital that they prefer to use a digital reading device, or really want to carry “a thousand books at once” (which may also mean that their device has $7,000 to $20,000 of documents on it…)

Ø    Personally, I believe ebooks as wholesale replacements for print books and magazines are a solution to a nonexistent problem, now and in the medium-term future. I believe most people in most generations (including the supposed mutant kids) will prefer to read most narrative booklength texts and most magazines in print form. But if I’m wrong (which is certainly possible!), I’d like people to have the best possible digital reading experience—and good e-ink/e-paper might offer that experience.

Ebooks, Etext, E-ink, E-paper and Public Libraries

Do I believe any of this endangers libraries? Only in two nightmare scenarios, where publishers decide to shift all publication to locked-down ebook forms with truly draconian DRM—or, worse, where digital publications are made available entirely on rental “pay per read” basis.

Either of those scenarios has the potential to wreck public libraries in their role as the commons of shared resources. That wouldn’t necessarily doom libraries, but it would eliminate what I consider to be their most essential role, one of the few that can’t be replicated readily by other public agencies.

I don’t believe that will happen—partly because I don’t believe the nightmare DRM scenarios are likely, and don’t believe wholesale conversion to e-reading is likely in my lifetime or yours. Actually, I believe the Sony BMG scandal (see elsewhere) may help alert the public to the general problem of DRM. That’s a good thing—but one effect that some digiphiles may consider undesirable is that it’s likely to lessen publisher interest in converting to ebooks.

Cites & Insights: Crawford at Large, Volume 5, Number 14, Whole Issue 70, ISSN 1534-0937, a journal of libraries, policy, technology and media, is written and produced by Walt Crawford, a senior analyst at RLG.

Cites & Insights is sponsored by YBP Library Services,

Hosting provided by Boise State University Libraries.

Opinions herein may not represent those of RLG, YBP Library Services, or Boise State University Libraries.

Comments should be sent to Comments specifically intended for publication should go to Cites & Insights: Crawford at Large is copyright © 2005 by Walt Crawford: Some rights reserved.

All original material in this work is licensed under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.