Cites & Insights: Crawford at Large
ISSN 1534-0937
Libraries · Policy · Technology · Media


Selection from Cites & Insights 5, Number 10: September 2005


Perspectives

Summertime Blahs

It’s summer, not always a lazy time for me—and this summer’s been extraordinarily busy at work and began with other issues. The net effect has been more than the usual summer lassitude. Not so much lassitude that I didn’t carry out the “biblioblogosphere” investigation, but enough to discourage Big Essays on Big Topics. So here are a couple of medium-size essays on topics that may (or may not) deserve better.

Iconoclasm and the Great God Google

This one’s been simmering for a while and bubbled up during the first half of July thanks to a spirited and sometimes bemusing set of threads on Web4Lib. I used the discussions as a springboard for my “disContent” column in the November 2005 EContent Magazine. If you want to see all the text, web4lib’s new home is web4lib@webjunction.org. (You may have to join in order to view the archives.) The threads include “Another Google question” and “Google limit of 1,000 results.” At least one of them started before July 6, where I picked them up.

Most of you are probably aware of the background. Google is now the most popular open-web search engine, although Google searches are still a minority of web search-engine searches according to most studies. Google has become synonymous with web searching, just as AltaVista may have been a few years back—but on a much larger scale as more people regard the web as a fact of life. Because Google’s ranking algorithms worked remarkably well for several years, it’s gained a stature that none of the other engines currently enjoys—even though, at least in my opinion, the algorithms don’t work nearly as well as they used to. Google has also acquired or created a number of other services, some still beta, some near-failures (e.g., Google Answers), some successes. This is all great and good, and affects libraries by providing a useful, easy-to-use tool for searching a broad new range of resources.

You can throw in Google Scholar and Google Print if you like—not for what they are now but for how they’re perceived. That perception, as for Google itself, is the heart of the problem that  resulted in some of the attitudes shown in this discussion.

The perception is threefold, overstated here for clarity:

Ø    Google can do no wrong and all Google plans succeed.

Ø    Google does right what library systems and services (online catalogs, fee-based indexes) do wrong—and the solution is for library systems and services to be “just like Google.” Not paraphrasing: “Google is all you need.”

Ø    For the pessimists, Google dooms libraries, because “everyone” “always” wants to use the simplest means possible to get something “good enough” (and, of course, because Kids These Days don’t really read print books or use anything that’s not digital).

There are related issues, but I’ll leave those for later.

One of those raising questions about Google’s unlimited and universal excellence for all online tasks is Roy Tennant, about as fierce a critic of traditional online catalogs as you’ll find. Roy knows there’s loads of room for improvement in library tools, but he’s also aware (as anyone working for the California Digital Library should be) that there are many different users with many different needs, and one set of tools won’t handle them all equally well. More specifically, Google doesn’t replace the scholarly apparatus and research databases—not even online catalogs.

I don’t recall whether Bernie Sloan or Patricia F. Anderson raised the point first (I think it was Bernie), but they noted what some of us have known for some time: You don’t know how large a Google result is (except for small ones), because you’ll never see more than 1,000 records. For all I know, “about 28,400” records for the phrase “Walt Crawford” (searched on August 8, 2005) could be something over 28,000—or it could be the 975 records I get when I “repeat the search with the omitted results included” (which gives “about 28,600” as an unviewable set). Initially, I can only view 460 records.

Did the denizens of Web4Lib jump all over Google, shouting Gormanesque attacks and asserting that Google should never be used in libraries or for research? Not at all. The harshest attack I could find anywhere in the thread was Roy Tennant’s:

Google does one thing, and it appears to do that one thing well. But let’s not make the unfortunate assumption that it does more than that one, very specific thing.

Roy’s talking about “home Google” here. He’s pointing out that Google involves a specific set of assumptions about user needs with no way to change those assumptions. You can’t really get the most recent pages (and Google’s algorithm tends to bury new linking pages), since the date limit is mostly useless (in all web search engines: it’s the nature of web page dates)—and you can’t examine the entirety of most result sets.

This mild criticism and a slightly more pointed one that follows (related to Google’s version of link: results, which has become useless or misleading) are a form of iconoclasm: Suggesting that an icon isn’t all that it should be. Nobody—nobody—on Web4Lib was saying Google’s useless in general.

Battling Iconoclasm

Lars Aronsson said flatly, “no real searchers would be interested in more than the first 900 hits.” When offered real-world examples where researchers would want to examine an entire result set, Aronsson denied the validity of the examples. He wasn’t the only one. Mike Taylor opined:

We couple of hundred information professionals on this list care deeply about this stuff, but we do need to come to terms the fact that no-one else does. As far as the other 5,999,999,980 people out there are concerned, Google is just fine. If we pretend otherwise, we’re hiding ou[r] heads in the sand.

Aside from the fact that Web4Lib’s membership is in the thousands, and that there are at least a couple of hundred thousand librarians in the world who should care, Taylor negates several million researchers of all stripes—scientists, lawyers, doctors—for whom “Google is just fine” is an unethical and dangerous stance. (Karen Coyle responded: “The fact that they think it’s fine doesn’t make it fine. Ignorance may be bliss, but it’s a lousy basis for what purports to be an ‘information society.’”)

Jeremy Dunck was “surprised at the…glee that folks on this list seem to take at poking holes in information tech that’s available.” I saw no glee. I did see an attempt to discuss real-world flaws and to emphasize that Google is not the universal answer to all searching needs. Dunck also questioned the “Google fetish” on a list about “using the web to the benefit of libraries,” an odd challenge given that everyone who criticized Google on this list recognizes that it is a powerful web tool to extend reference services.

Here’s a sample of that “gleeful” approach from Patricia F. Anderson:

I like Google just fine, but it is far from doing everything I’d like it to do. I also see no reason for Google to try to do everything—some specialized tasks are best in a niche market, where the people who truly care about that will pay attention and take care of it. I don’t need or want one search engine that tries to be all things to all people.

That’s about as “bashing” an attitude as I saw, unless you count Roy Tennant’s continuing insistence that Google doesn’t do everything equally well and that researchers sometimes have legitimate reason to want to see items “less relevant” by Google’s algorithms.

Why do I call this iconoclasm? Because the critics don’t accept Google as an object of veneration; they refuse to treat it as a religion. When you assert that Google does answer everyone’s real world needs and object to any criticisms, you’re treating it as an icon. Iconoclasm is the sensible result.

Placing too much faith in Google? Consider the mixed messages in Ryan Eby’s contribution. In the first paragraph, Eby says he uses Google “nearly exclusively (for web search) ever since [around its inception]. In that time I don’t think I’ve ever had a completely failed search…” Eby “never go[es] past two or three pages” and regards wanting deeper results as something “spammers would love.” Later, after saying he uses Google nearly exclusively and it always works, Eby comes up with this:

I personally, and everyone I know, know that Google is not the one stop shop for all research (nor would I want it to be), though it does a damn good job at some things.

Mike Taylor came back with the claim that Google’s “big, big win” is that “its top hit (or second, or third) is nearly always the one you want.” Taylor has a lot better luck with Google than I do—I’m finding that the results I want are frequently down in the second page these days, with semi-relevant commercial stuff taking up the first page. Karen Coyle’s response to Taylor was that Google is “very good at…the retrieval of pages based on proper names…where there is a single obvious answer…. For other types of searches, Google doesn’t work so well. There’s no ‘conceptual’ searching.” (Names of people also tend to be more difficult as Google’s index grows.) Taylor agreed with this observation—but concluded that most people do most of their searching using specific known items. (How he knows this universal truth is beyond me.)

Oddly enough, I’m finding that to be less true as well, particularly in the hospitality area. When I want to find a hotel’s website, if I don’t know what chain it’s part of, Google can be frustrating. Sometimes—maybe half the time, maybe more—the hotel’s or resort’s website comes out on top. Other times, reservation systems and chambers of commerce and other entities have succeeded in linking their way to the top.

Later, after discussion of a specific problem noted below, Bill Drew raised the “Google bashing” cry, calling it “nitpicking about obscure features.” Jennifer Heise suggested that the tone of the discussion “has really come across as ‘why Google is bad,” as opposed to the continued “why Google isn’t the universal solution” that I saw.

Roy Tennant responded:

I’m trying to more fully understand what Google is good at and what it isn’t good at. Given that Google is not very forthcoming on the help pages about limitations such as have been surfaced here by Bernie Sloan and others, this discussion seems to be one of the few places to get such information.

Bernie Sloan also responded: “My motivation is curiosity…trying to find out why things don’t seem to work quite as they seem to be advertised…” He also notes cases where he would legitimately go past 1,000 results.

The Link: Problem

The “no more than 1,000, based on our secret relevance sauce” issue is just that: An issue, not a failing. Most other open-web search engines have similar limits. Paid database vendors don’t have the luxury of telling users that they’ll get “some of” a result—but we’re paid, and should be held to a higher standard.

A slightly more serious problem arises with Google’s link-to searches: “link:[url],” a specific form of search offered by several open-web search engines. Bernie Sloan tried some of these searches and got results that made no sense to him. He asked Google about it and got this answer:

Our link search does not return a comprehensive set of results. The results will show a sample of the links that point to your page, but this list is in no way indicative of the link structure utilized by Google to formulate a page’s PageRank.

Sloan wasn’t asking how the PageRank was calculated. He wanted to see who was pointing to one of his own pages. That’s reasonable. I’ve done the same thing. Link counts also enter into any analysis of, for example, blogs (as in this issue’s major essay).

Roy wasn’t thrilled with this: “So, in other words. The Google ‘link:’ search is worse than useless. Useless because it fails to work as advertised and worse than useless because it will return just enough so one could image it was working as they supposed (and as depicted by Google).” Andrew Mutch verified that Google’s link results “seemed to shrink even as lists of sites known to be linking to the resource had grown.” Bernie Sloan suggested that Google should rename “link:” in the explanation: Instead of saying “Find pages that link to the page” it could say, “Find examples of pages that link to the page.”

In this case, I don’t believe there’s a reasonable defense for Google. The link: feature is broken. Google should either turn it off or explain that it’s just a sampling.

Nobody’s Bashing Google

OK, that’s not true. Let’s say that nobody on Web4Lib was bashing Google; I can’t speak for certain elected ALA officials. What was going on was librarianship: Investigating resources to determine when they should and should not be used.

The problem here is that too many people see Google as all you need and all you ever will need. That’s dangerous. Librarians need to help their users with a broad range of indexes and search tools—and that means understanding limitations of the leaders.

Creative Commons: Foe of Copyright?

Cites & Insights carries a Creative Commons license—one that reserves the right to profit from reuse of this material. Want to post this essay on your web? Feel free as long as you’re not charging, you cite the original properly, and you note that the essay is protected by copyright. Want to distribute copies to your class? No problem. Want to sell it to others? Big problem unless you ask.

There’s now a Creative Commons search engine and Yahoo! searches can include a Creative Commons qualifier. With more than five million CC-licensed websites, that’s a good idea. Lots of writers and musicians have concluded that CC licenses make sense, encouraging new creativity while protecting the rights they want protected.

But any weakening of maximal copyright, even weakening chosen by a copyright holder, seems to offend some groups. Sometimes, it’s a matter of indirection: “My concern is that many who support Creative Commons also support a point of view that would take away people’s choices about what to do with their own property,” says David Israelite of the National Music Publishers’ Assn. (in a May 20 Reuters article, originally from Billboard). That’s guilt by association, even though CC is precisely a way for people to make “choices about what to do with their own property.” Michael Sukin of the International Assn. of Entertainment Lawyers makes a similar leap: “Lessig and his followers advocate a shorter copyright term.” True enough—but entirely unrelated to CC (which does not lobby for changes in copyright law).

RIAA is not among the CC-bashers: Its president says that artists might want to make their music freely available, and that the CC approach solves that need. But there’s always a counter-example: Andy Fraser, who wrote “All right now” for Free. He’s afraid that he might have used a CC license when he was a young songwriter if one was available, and wouldn’t now have the royalties that pay for his AIDS treatment. His solution?

“No one should let artists give up their rights.” [Emphasis added.]

There it is in a nutshell: You should not be allowed to give something away, or even to give it partially away. So much for charity, the public domain, anything other than 100% “I’ve got mine” capitalism: No one should let someone else choose to reduce their own total control over something. This is, in its own way, as totalitarian a statement as any Communist could make, just at the other extreme.

Israelite doesn’t go that far, but loves to make broad claims: “Often when people give away their own property under a Creative Commons license, it is really an argument why others should be forced to give away their property.” I could say that’s meretricious bullshit, but “often” is enough of a qualifier to escape such claims. Nobody I know who uses a CC license is making any such argument—and, to be sure, most of us don’t “give away” our “property,” but grant some rights to others while retaining others for ourselves.

Which, according to Andy Fraser, we should not be allowed to do. So much for freedom.

Lawrence Lessig wasn’t thrilled by the article and placed it in context: Billboard has run other pieces espousing this anti-CC view. In an earlier piece, Sukin claimed that CC placed “U.S. copyright income” at risk and included a statement by the writer that CC’s “Founder’s Copyright” (which they’re no longer offering, but which established either a 14 or 28 year copyright in exchange for a $1 payment) was “urging creators to give up their copyright protection” for a buck. It was nothing of the sort, of course; it was a way of establishing a legal contract to limit that protection to 14 or 28 years.

With regard to this particular article, Lessig has a comment on Israelite’s assertion as to what CC users are really doing:

I love it when people tell me what my argument ‘really’ is. The whole premise of Creative Commons is that artists choose. We give licenses to creators. How exactly empowering creators is “really an argument why others should be forced to give away their property” is bizarre to me. By this reasoning, when Bill Gates gives $20,000,000,000 to help poor people around the world, that’s an argument for socialism.

Lessig goes on to note that the guilt-by-association link of CC supporters and shorter-copyright supporters is particularly specious: “The RIAA believes it is appropriate to sue kids for downloading music. They’re supporters of Creative Commons. Does it follow that Creative Commons supports suing kids for downloading music?”

Cites & Insights: Crawford at Large, Volume 5, Number 10, Whole Issue 66, ISSN 1534-0937, a journal of libraries, policy, technology and media, is written and produced by Walt Crawford, a senior analyst at RLG.

Cites & Insights is sponsored by YBP Library Services, http://www.ybp.com.

Hosting provided by Boise State University Libraries.

Opinions herein may not represent those of RLG, YBP Library Services, or Boise State University Libraries.

Comments should be sent to wcc@rlg.org. Cites & Insights: Crawford at Large is copyright © 2005 by Walt Crawford: Some rights reserved.

All original material in this work is licensed under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/1.0 or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

URL: citesandinsights.info/civ5i10.pdf