Cites & Insights: Crawford at Large
ISSN 1534-0937
Libraries · Policy · Technology · Media


Selection from Cites & Insights 6, Number 6: Spring 2006


Net Media

Blogs, Google and Porn

Will my emailed announcements for this issue get clobbered by email filters? We’ll see. I’m not making a connection between blogs and porn (I haven’t looked at Fleshbot, so I can’t say whether such a connection exists)—but there’s definitely a recent connection between Google and porn.

The power law continues

Jean Véronis posted “Blogs: The last will never be the first” at his semi-bilingual Technologies du langage (aixtal.blogspot.com) on October 3, 2005. He notes his own Technorati ranking (4,724 based on links from 210 sites at that point): “Not bad for the old ego!...All the more so since the disproportion between languages means that Blogs in French are at somewhat of a disadvantage…” He looked at the relationship between Technorati rank and number of referring sites more closely, surveying “about one hundred blogs that go from one end of the ranking to the other.” No surprise: The relationship roughly follows a power law. He states that nicely, for those comfortable with logarithms: “If we put the ranking on one axis, and the number of sites on another, and we put the whole thing in logarithmic coordinates, we get a more or less straight line.” That’s typical of Zipf distributions and most other power-law cases. He notes where the “straight line” fails in his survey, however: Past about rank 10,000, the number of links drops faster than a power law would suggest. “In a way, there are ‘too many’ blogs who have few incoming links.”

He suspects that’s due to spam blogs or “splogs,” which one observer suggests may make up 60% of Blogger blogs. That might be true, although I’d expect the dropoff to come further to the right. In any case, he notes that the power law “can lead bloggers to despair” since it means a tiny minority of blogs get nearly all the references, “while the immense majority of blogs are not quoted (or perhaps even read) by anyone, or certainly by very few people.” I’m leery of the proposition that not being linked to means not being read, but never mind. The next two figures are startling but not at all improbable:

Ø    “Only” about 777,700 blogs (as of last October) had references from two or more sites (that’s still an enormous number).

Ø    Roughly 93% of all blogs aren’t referenced by anyone.

He believes “the inertia of the ‘big guys’” makes it difficult for anyone to climb very far up the power law. One exception, which moved to 90th place after a few months, was largely because of heavy coverage of Katrina—and that exception slipped after the immediate crisis passed. Technorati now counts only links within the last six months, but since that includes blogrolls, I’m not sure it has much effect: The chances for a new blog—particularly one that’s not a “problog” (professional blog) setting out specifically to gather a big audience—to break into the hot 100 or even the warm 1,000 are pretty small.

When he wrote this piece, you needed 552 referring sites to be in the top 1,000; 200 for the top 5,000; 120 for the top 10,000—and 20 for the top 100,000. A fair number of library-related sites belong in that broad category (a Technorati search in early April 2006 shows 39, but that’s only blogs whose owners have “claimed” them and used Libraries as a tag, and some of the most popular library blogs aren’t on that list), but few fit in the narrower categories (14 claimed-and-tagged blogs in the top 10,000, seven in the top 5,000—and none in the top 1,000).

Seth Finkelstein and Jon Garfunkel have written about these issues a lot, and I’ve discussed some of that writing in previous issues. Things haven’t really changed—and probably won’t. The power law is a broad phenomenon and the “echo chamber” nature of blogrolls and fandom make it more obvious within blogging than in a lot of other areas. I’m with Seth F. (in a February 14, 2006 post) in finding Technorati’s “authority” feature unfortunate—it equates popularity with authority and uses that as a way to further reduce visibility for less well-known blogs. Technorati bills it as “a good way to refine your search results”; I’d call it a good way to avoid distinctive and unusual perspectives. Those who regard USA Today as the most authoritative newspaper should love the feature.

If anyone seriously claims that popularity is synonymous with authority, I would assume that they agree that it’s authoritatively true that the U.S. was created a few thousand years ago with all existing species in place—after all, that’s certainly the most popular view in the U.S.

A curious New York article made the blog rounds: “Blogs to riches” by Clive Thompson, issue not stated in the web version (www.newyorkmetro.com/news/media/15967). It’s all about blogging as a way to make money. Naturally, it discusses Clay Shirky and the power law, the advantage of first movers, the inbred nature of the hottest blogs (“popularity breeds popularity”), and all that. But it seems to view blogging entirely in terms of business models: If you’re not trying to make money from your blog, why are you writing it? The article goes so far as to state that the Huffington Post, a relatively young blog that began with big-money backing and a “full-time staff of four” to actually post, “represents a sort of death knell for the traditional blogger.” The “new model for success” is corporate blogs—that is, blogs created by corporations. Here comes the new boss, same as the old boss.

Putting it charitably, this is narrow-minded horsepucky. It’s like saying that zines don’t exist because they’re not started by major publishers and generally don’t make money. If the only measure for success is making money—and maybe it is in New York—then it’s true that 99.9% of blogs are failures, certainly including mine and almost all other library-related blogs. But making money is not the reason most people blog, and most blogs are created and run by people, not corporations. This goes beyond the “long tail” aspects of most traditional media, where a specialized journal with a circulation of 1,500 may be quite as successful as a national magazine with a circulation of 1.5 million; this goes to the zine market, where a few dozen readers may represent success.

It’s probably important to say at this point that Seth Finkelstein and Jon Garfunkel are, as far as I can tell, right about what they call “gatekeepers”—within any given field, a relatively small number of bloggers commands most of the attention and, to some extent, dominates the topics under discussion. For relatively small fields, that may not be an awful situation: It’s not too difficult to break into the top hundred library-related blogs (or even the top fifty). But, as Finkelstein notes, that’s little solace if the fields you’re interested in aren’t narrow fields—if you’re interested in politics or the like. There, things seem to be getting worse: The chances of a single amateur to be heard aren’t zero, but they’re no better than in traditional media.

Attack of the blogs

As I was thinking about the New York article, I realized that it wouldn’t have bothered me if it was in Forbes or Business 2.0. In those magazines, you’d expect money to be the only measure of success. I expected better of New York, but I’ve always been naïve.

Speaking of Forbes…an article with the title above (by Daniel Lyons) appeared at Forbes.com for November 14, 2005. It’s a doozy, starting with this lead: “Web logs are the prized platform of an online lynch mob spouting liberty but spewing lies, libel and invective. Their potent allies in this pursuit include Google and Yahoo.”

Pretty strong language and it doesn’t say “A few web logs,” it says “Web logs.” Nice smear of an entire medium! It goes on with a supposed horror story: A blogger made nasty comments about the head of a company. I don’t know the facts of the story, although the reporting is slanted. For example, Lyons immediately labels the bloggers campaign “long on invective and wobbly on facts,” but never identifies factual errors. Instead, he goes on to defame blogs once again:

Blogs started a few years ago as a simple way for people to keep online diaries. Suddenly they are the ultimate vehicle for brand-bashing, personal attacks, political extremism and smear campaigns.

Blogs are labeled as a “new and virulent strain of oratory.” Somehow, revealing the Kryptonite bike-lock situation is, I guess, a smear and brand-bashing. A marketing officer says “Bloggers are more of a threat than people realize, and they are only going to get more toxic.” A PR VP talks about the “potential for brand damage”—and a lawyer asserts that half of the attacks are “sponsored by competitors.” We’re told that Groklaw “exists primarily to bash software maker SCOGroup…producing laughably biased, pro-IBM coverage; its origins are a mystery.”

Worse: Google and other “formidable allies” of the “online haters” “operate with government-sanctioned impunity.” Lyons appears to believe that any blog host should be responsible for ensuring every blog post is fair and accurate—which also implies that Comcast should be held legally responsible for assuring that, for example, Fox News is fair and accurate at all times.

It’s quite a story, all those innocent little corporations being smeared by those evil online haters. There are more examples: A CNN executive who used the word “targeted” in an off-the-record conference and had that word repeated in a blog. The executive “instantly and repeatedly denied the assertions.” Lyons doesn’t suggest that the assertions were false. It’s clear that, from Lyons perspective, it was fine for the CNN executive to lie about what he said, but evil for the “blog hordes” to keep “wailing away” with an apparently true but “off the record” statement. And he quotes a right-wing blogger who complains about left-wing bloggers hounding that White House “reporter” who allegedly worked as a male prostitute.

Deep links, EFF’s blog, had a charming response on October 28, 2005: “Attack of the printing press!” It takes pretty much the same wording but applies it to pre-Revolutionary War America and the role of printing presses in undermining the benevolent authority of the King. You’ll find it at www.eff.org/deeplinks/archives/004105.php.

I’m not defending anonymous libelous attacks, but that’s not what most of the Forbes story seemed to be about. Lyons does a fair job of smearing all blogs and suggesting that bloggers critical of corporations and businessmen are “online haters” and “virulent.” The remedy? Make the hosts responsible for assuring the truth and fairness of posts. Heck, for that matter, shouldn’t grocery stores that carry Forbes on their racks be required to assure that every article in the magazine is fair and true? After all, free speech can hurt corporations (and people, as if that mattered).

Seven deadly sins of blogging

I found this one at GreatNexus webmaster blog on November 19, 2005 (www.greatnexus.com/blog/85.html). Pinyo Bhulipongsanon calls these the “seven worst things a blogger can do”: Use free blog hosting services, ignore the basic principles of good site design and usability, be the “jack of all trades” (blog on more than one topic), don’t post regularly (the writer argues for at least one post a day), write badly, spam and steal, and fail to establish a personality. The post runs four pages (followed by 20 pages of comments); these are just normalized versions of the primary points.

Once again, it depends on your purpose—at least as far as the third and fourth points. (It’s hard to argue for bad site design or usability, spamming, stealing, bad writing, or impersonal blogging—and the point about free host services is a tricky one.) Yes, if you want a big audience of people going directly to your blog, so you can get the big ad revenue, you have to post every day. But for many of us, with aggregators, that’s not what blogging is about. Most amateur bloggers—most “real bloggers,” if you will—want to find their appropriate audiences, people who will appreciate or be engaged by what they have to say. Also, yes, sticking with a narrow topic may make you more of an Expert on that topic, and if you’re blogging to fish for speaking or writing invitations, that’s a good thing. On the other hand, Boing Boing is still the #1 blog (as far as I know), and that’s not what I’d call a focused blog. Some of us like to be surprised by the blogs we subscribe to; we’re interested in what people have to say, and it doesn’t hurt for new areas to show up.

Comments are all over the place—some agreeing, some pointing out problems with the blog (it uses a fixed layout, always amusing on very small and very large browser windows), some adding new points. “Not answering [to] comments” comes up as one new sin. One commenter specifically notes that the aim is “just where you intend to go with blogging”—and for a personal journal, hosted blogs make good sense.

Writing and authority

“Momus” contributed an excellent piece at Wired News on November 29, 2005: “Blogging with a wooden tongue.” It’s about PR blogs—“official websites” that really violate the last rule in “seven deadly sins.” Telltale signs of a wooden-tongue blog: Content claimed to be written “by someone powerful who’s obviously too busy to write a blog” that reads “like it’s been phoned in”; the blog never raises controversial topics; the blogger is “incongruously humble and modest.” Momus provides an example, apparently from the curator of an exhibition but with none of the flavor of what goes into mounting an exhibition.

InfoTangle’s blog/article for February 20, 2006 is “Authority in the age of the amateur” (infotangle.blogsome.com, find from there), a six-page article (four pages plus 28 endnotes) that discusses some concerns raised by critics of blogs: They lack filters, they lack authority, bloggers are amateurs. I’m not entirely convinced by some of the answers—does appearing on lots of blogrolls really constitute authority?—but it’s a thoughtful discussion. After discussing whom we trust these days, the author offers suggestions for judging the worthiness or authority of a blog. She suggests that librarians use “their unique expertise to evaluate and recommend authoritative blogs” by creating OPML-based reading lists. An interesting approach. I wonder which librarians I would trust to recommend “authoritative” blogs?

John Scalzi posted “Writing tips for non-writers who don’t want to work at writing” at Whatever on February 12, 2006 (www.scalzi.com/whatever/ 004023.html). It’s a nice casual discussion, but I believe he gets some of the punctuation guidelines wrong (even wronger than my frequently poor punctuation). He does recognize that, while brief paragraphs may be good, it’s easy to overdo it—how many online sites have nothing but single-sentence paragraphs, with meaning chopped up into Little. Separate. Bits? “Learn to friggin’ spell” makes the point that every spelling error cuts 5 points from your “apparent IQ”—and that every mistake of the “there, they’re, their” type—the ones spell-checkers won’t get—drops your apparent IQ by 10 points. He notes how many MAs and PhDs are prone to such errors. He also suggests that you not use words you don’t really know (particularly slang) and offers a number of other points, starting and ending with “speak what you write”—the idea that good writing should emulate speech. Fair warning: While the post is only 6.5 print pages long, comments go on for 61 more pages.

Lori Mortimer offered a response of sorts at Blogcritics.org on February 15, 2006: “One simple rule for improving your writing.” (blogcritics.org/archives/2006 /02/15/180927.php). I suppose the “one…rule” is the first of four guidelines Mortimer says Scalzi missed: Use the active voice. The others: Use simple, strong verbs; sleep on it; and get feedback from at least two people. Mortimer seems to advocate these ideas—even the last two—for blogs, where it strikes me they’re improbable and possibly inappropriate. (How many of us “get feedback from at least two people” for any writing prior to publication for online writing or submission for print publication? All the time?)

Mortimer dissects Scalzi’s punctuation guidance extensively. She’s probably right in some areas—but some of her advice is more confounding than helpful. Scalzi’s advice on periods: “When you’re writing down a thought and you’re at the end of that thought, put a period.” That’s way too simple—but what can you do with Mortimer’s counsel? “The only way to know where to put a period is to know where a sentence ends. And the only way to know how a sentence ends is to learn the parts of speech, usage, and sentence construction.” Gee, that helps.

Google and Friends

Gary Price posted “Keeping yourself out of web and other databases” at Search engine watch on October 3, 2005. He notes a Wired news article about a person who “values her privacy” and is trying to keep herself out of Google. “We’ve seen stories like this before.” Price offers a reality check. (blog.searchenginewatch. com/blog/051003-152112)

First, it’s not just Google—and based on my own experience, Google’s spiders are not the most aggressive these days, although they used to be. (In January 2006, Yahoo! Slurp hit Cites & Insights 5,123 times; Googlebot a mere 2,340, not all that far ahead of MSN Robot at 1,528. The rest—and there are a lot of spiders out there—top out at 498 hits during the month. But then, over at Walt at Random, two different Googlebot “robots” seem to account for more than 10,000 hits in all, while Yahoo! Slurp accounts for a mere 4,116 and msnbot 3,585.) “Staying out of Google” will only keep you anonymous from people who’ve never heard of other search tools.

Beyond web search engines, as Price notes, there are lots of other tools to find out about people, both within the open web and more so within “deep web” databases. For a few bucks, you can get aggregated information from several services.

If you’re “out there” it’s not Google’s fault—although it’s true that Google and competitors could be more up-front about ways to keep material out of the databases. Beyond that, I think Price is just barely right: “Trying to remain completely and totally private in the United States might be possible. Very difficult, but I guess possible.” Just barely possible, and probably not worth the effort.

For most businesses and bloggers and websites, the desire is different: To be as prominent as possible in Yahoo!, Google, MSN, and the rest. Some of them want that prominence to be selective—and one porn site, Perfect 10, seems to be having some success in suing Google over the issue. On February 22, 2006, Judge Howard Matz issued a preliminary injunction against Google’s display of thumbnail images from Perfect 10 within Google Images. I’ve seen thoughtful discussion of the findings and issues from Seth Finkelstein at Infothought, Fred von Lohmann at Deep links, “kim” at LawFont.com, and—briefly—Alan Wexelblat at Copyfight. There’s also commentary at Sivacracy.net, in the usual combative tone that site now seems to take in all manners Google-related.

I’m assuming here that Perfect 10 doesn’t have a “no-crawl” file on its site, since Google would honor such an instruction. I’m guessing the porn site wants to be discovered via search engines. But Perfect 10 has a business for cell-phone users who like their nekkid women on the very small screen, selling pictures through Fonestarz that may not be much larger than Google Images thumbnails. So Perfect 10 claims Google is interfering with an existing business use, helping to undermine fair use claims. They also claim Google’s potential sharing of ad revenue with infringing sites that copy the Perfect 10 images, via AdSense, constitutes commercial gain.

What’s bad here? The court’s broad definition of “commercial use” could impact Google Book Search—and a lot of Google and its competitors. Fred von Lohmann includes a fascinating tidbit on the “market interference” portion of this: “As for Fonestarz, I don’t think the court was adequately sensitive to indications that the arrangement was a sham concocted for this litigation (the court notes that the license was not entered into until after Perfect 10 sued Google).”

Otherwise, not as much as you might think. The court rejected the idea that Google linking to an infringing site itself constitutes an infringement, even if that link is the same-page display you find in Google Images. The court also rejected the idea that Google should be held responsible for infringing sites that it links to because it “created the audience” for those sites. And the court couldn’t buy Perfect 10’s notion that visiting an infringing website (and, thus, temporarily caching copies of infringing material) inherently constitutes infringement—a notion that would devastate the web as a whole.

The injunction is preliminary, subject to appeal—or to Google making an arrangement with (“paying off” is such an ugly term) Perfect 10.

A couple of other Google-related items may be worth noting. You’ve seen one of them, although you may not realize it. As John Battelle noted in September 2005 (battellemedia.com/archives/001889.php), Google did two things: Claimed that its index was three times bigger than any competitors—and stopped showing an index size claim on the home page. Since Yahoo! and MSN (the two primary competitors for web search) are both portals, their claimed index sizes aren’t featured prominently in any case; now, you only see such claims on smaller sites such as Exalead. Yahoo! famously claimed last summer that its index was bigger than Google. The Google claim is as unprovable as the Yahoo! claim, but what’s new is Google changing to “most comprehensive search engine by far” without numbers—and, as before, without any good way to test the assertion.

As Battelle points out, stopping the numbers race may make sense, since it returns the focus to relevance. My own sense, for my own searches, is that “relevance” has gotten worse for all three engines; that any engine handling more than a couple billion sites seems to include so much noise that it’s increasingly hard to ferret out the signal. That’s partly because I’m not just looking for the “top” result—I’m looking for the useful results within the range the engine is willing to show me. My recent experience? Sometimes MSN Search is better. Sometimes Yahoo! is better. Sometimes Google is better. I don’t find any of them consistently best. But that’s me.

Incidentally, Battelle got a response from Yahoo! about Google’s decision to stop mentioning numbers:

We congratulate Google on removing the index size number from its homepage and recognizing that it is a meaningless number. As we’ve said in the past, what matters is that consumers find what they are looking for and we invite Google users to compare their results to Yahoo! Search at http://search.yahoo.com.

To which Battelle responds, “Why on earth, then, did you announce that 20 billion number in the first place?” But he’s happy that “this is the end of it.”

Finally, on the off chance that any C&I reader doesn’t already know about it, I should mention Google’s Newsletter for Librarians, launched in December 2005. Full transparency: I accepted an invitation to write an article for a future issue; I don’t yet know which future issue. You can sign up for the newsletter at www.googlelibrarian.com, which also links to a blog. The first issue featured Matt Cutt’s “How does Google collect and rank results?” It’s a little simplistic (and, of course, it doesn’t reveal any of Google’s proprietary ranking mechanisms), but it’s a good start.

Cites & Insights: Crawford at Large, Volume 6, Number 6, Whole Issue 76, ISSN 1534-0937, a journal of libraries, policy, technology and media, is written and produced by Walt Crawford, a senior analyst at RLG.

Cites & Insights is sponsored by YBP Library Services, http://www.ybp.com.

Hosting provided by Boise State University Libraries.

Opinions herein may not represent those of RLG, YBP Library Services, or Boise State University Libraries.

Comments should be sent to waltcrawford@gmail.com. Comments specifically intended for publication should go to citesandinsights@gmail.com. Cites & Insights: Crawford at Large is copyright © 2006 by Walt Crawford: Some rights reserved.

All original material in this work is licensed under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/1.0 or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

URL: citesandinsights.info/civ6i6.pdf