Perspective
Looking at Liblogs: The Great Middle
Two years ago, it was plausible (if extremely foolish) for a high-profile librarian to make fun of blogs in general, including blogs from libraries and by library people. A year ago, there were many such blogs, but relatively few with substantial readerships. Now, there are certainly more than a thousand blogs from libraries and by library people. At least three ALA divisions have blogs; that number will grow.
I’m guessing most readers already subscribe to, or know of, the big-name liblogs—those that have been around since the last century, those that come from high-profile people, those that have made names and reputations for themselves.
Last year’s most widely read Cites & Insights essay was almost certainly Investigating the Biblioblogosphere (C&I 5:10, September 2005, http://citesand insights.info/v5i10b.htm). It’s even the second most widely read essay this year, albeit far behind the favorite (Library 2.0 and “Library 2.0,” with almost 16,000 total downloads as of early August 2006).
That Perspective, inspired by Jon Garfunkel’s “Social Media Scorecard,” was an informal investigation of 60 blogs by library people, blogs I perceived as having wide “reach.” I identified 238 initial candidates, measured the “reach” of those blogs, and came up with a set of 60. I ran various quantifiable tests (metrics) on those 60 blogs, reported standout blogs for each metric, and added one or two brief paragraphs discussing each 60 blog. I listed the blogs in descending order by reach—as I measured reach. Most, if not all, of those blogs are among the “big-name liblogs”—ones you’re likely to know about.
Along with a gratifying number of positive responses to the piece, I heard criticism on a variety of grounds—especially regarding the significance of “reach” and the hazards in ranking blogs. It didn’t help matters that some people linking to or discussing the article referred to it as “Walt Crawford’s Top 50 Blogs” or something of the sort—even though I tried to make it clear that this was a Top 50, not the Top 50. That turns out to be a distinction requiring more explanation than it’s worth.
By the time the dust settled, I knew I wanted to do something similar this year—but I wanted to do it differently. In the past few months, thinking about blogs has become more complex, especially as feeds and aggregators have become so easy and popular.
For example:
• Feeds eliminate the need to blog all the time in order to be visible. With aggregators, “blogging to be blogging” can be a danger: Your posts seem forced or repetitive and may encourage people to unsubscribe.
• For those who aren’t out to be A-listers or politicians, size of audience has diminished in importance. The hope now is to find the right audience, which might be anywhere from half a dozen friends to a few thousand strangers.
• More than one commentator has suggested that the most interesting blogs are in the great middle—blogs with more than a handful of readers but not so popular as to carry the burden of popularity.
I’m not fond of “biblioblogosphere”—and even if the term’s OK, I don’t think it fits this group of blogs. For one thing, there are loads of “biblio” blogs about books quite apart from the realm of blogs written by library people. For another, I continue to exclude official blogs, those explicitly identified with a library or organization. I’m using liblogs, not because it’s catchy but because it’s short. If you want to think of this essay as “Biblioblogosphere 2: Avoiding the A List,” be my guest.
I used Walt at Random as a support mechanism for this year’s look. When I decided to do it, I invited direct feedback as follows (excerpted):
“Want to opt out? If you just don’t want your blog involved at all, here’s what you need to do: Send email to citesandinsights@gmail.com or waltcrawford@gmail. com with the subject heading Liblog optout, and give the name of your blog and an email address I can use to verify that it’s you and not someone else. You don’t need to provide a reason…If you opt out, your blog just won’t appear. Period. Email should reach me by July 15, 2006.
“Usage numbers? I’d like to try to correlate Bloglines subscription counts with direct/indirect readership. You can help, if you have access to stats for your weblog… Here’s what you can do to help:
“Find two figures for May 2006: The average sessions per day (or total sessions: I can divide by 31), which is almost always easy to find, and the unique visitors during the month–or “unique IP addresses” if that’s what you have…Email should reach me by July 31, 2006.
“That’s it. I hope not to get any of the first category of email, but will honor whatever I do get (and can verify). I hope to get at least 10-15 of the second category…
“Thanks. Oh, by the way, if you have a liblog–not an official library blog–that you think I’ll overlook because it’s not listed in any of the typical places, you could also send me appropriate email.”
I received four opt-outs. I’m confident none of them were hoaxes. Given the decisions I’d already made about this year’s project, the opt-outs didn’t affect the final list of blogs. I received 15 responses on usage numbers, some offering only one of the two numbers. I discuss those results near the end of this Perspective, under “Direct Reporting: Null Results.” I also received one or two names of blogs that I would have missed otherwise and a couple of suggestions that were already on my Bloglines list.
Last year, I was looking for liblogs with the broadest reach—and with several other criteria. The blogs had to be by one to four self-identified library people (not “official library” blogs and not large-group blogs), have at least one posting in 2005, have at least one RSS/ATOM feed—and, although not explicitly stated in the Perspective, had to be in English.
I fine-tuned criteria this year. There’s no limit on group size, but I eliminated official blogs of all sorts. There had to be at least one posting during March through May 2006. I require at least one feed that Bloglines can recognize—but I dropped the English-only criterion. I’ve been loose about the “official” criterion (is BlogJunction an official blog?). Dropping the English-language criterion was easier because I also decided not to comment on the voice and primary focus of individual blogs. You can run metrics on a blog without being able to read it.
The biggest change has to do with reach and readership. To the extent that an observer can gauge reach and readership, I wanted to avoid the “A list” in favor of a broader group of liblogs in the Great Middle. What’s the Great Middle? It’s the middle of the power-law curve: Blogs with more than a handful of readers, which garner some attention but aren’t among the most popular in the field. There’s no clear definition of that middle, just as there’s no clear definition of reach or readership. In this case, it’s a little less than half of the liblogs that meet other criteria (and that I could find), omitting roughly the most widely read sixth and least widely read third. “Roughly” is the right word in all cases.
If you’ve been following Walt at Random, you can skip most of this section and go on to “Results and Metrics.” For that matter, if you don’t care how I arrived at a sample that I do not claim to be statistically meaningful, complete, or anything other than “a big chunk of the Great Middle,” you can skip this section (although I throw some commentary in along the way).
I started with my existing Bloglines library-related set, which had grown to 240 subscriptions. My guess was that this set included most of the most widely read English-language blogs and a decent sampling of slightly more obscure ones. Bloglines has made it much easier to determine the sum of all feeds: When you click on the “Sub with Bloglines” button on the Firefox bookmarks toolbar, it shows the subscription count for each feed. I recorded the sum of all feeds (except comment-only feeds) for each blog, prepared a first-cut spreadsheet, and removed roughly the top 10% and bottom 10%, leaving 200 blogs with 16 to 689 Bloglines subscriptions.
I then checked blogs from three sources, only one of them the same as last year’s. I didn’t use LISFeeds because the new user interface has no obvious way to print out a list. I didn’t use the Libdex Library Weblogs list because it seems to be stagnant. I kept the Dmoz/Open Directory subdirectory of LIS Weblogs, although it’s somewhat stagnant as well. The most important new source is the LISWiki Weblogs page, but I also downloaded the PubSub libraries list.
I began with LISWiki blogs not already in my Bloglines list, then went through Open Directory blogs that didn’t show up elsewhere, and finally picked up new items from PubSub. I added blogs with 16 to 689 total Bloglines subscriptions. While my intent was to avoid anything without postings during the March-May 2006 period, the subscription process made it easy to pick up extras.
If there’s a broad claim I’m willing to make based on this process, it’s that Bloglines users (among library types) tend to prefer Atom feeds: the Atom count was usually (not always) higher than alternative feeds.
Here’s what I found:
• LISWiki Weblogs page, blogs new to me in the Individual and non-English sections: 112 had fewer than 16 subscriptions (a bunch more, including a slew of Persian blogs, had no subscriptions). Seven had more than 689 subscriptions. Sixty-three showed no post later than February 28, or had no feed, or weren’t really blogs. I added 149 new blogs to the candidate pool.
• DMOZ/Open Directory, those not looked at in the first two steps: Seven had too few subscriptions. None had too many. Twenty-three were missing in action, had no feed, or were otherwise ineligible. I added four blogs to the pool—but I’d already considered most of these last year.
• PubSub library list members that hadn’t already been looked at, plus blogs whose creators sent me information about them: 18 had too few subscribers, none had too many, eight didn’t have feeds or lacked contemporary posts. I added another 15 blogs to the pool.
That left me with 368 candidates—far too many even for this expanded essay. I checked something like 650 liblogs in all, of which 554 are still active, aren’t official or corporate, have an RSS feed, and have at least one subscription.
Five hundred fifty four—as compared to 231 last year. Even with non-English blogs included that represents a doubling in liblogs—or at least a doubling in visible liblogs. LISWiki makes an enormous difference (and I hope people keep adding liblogs to LISWiki!). I’m not sure how many of those 554 started within the past year. Of the 213 in the final study, 59 (28%) began after June 2005 (the cutoff for last year’’s study).
I had to cut more. There are many ways to cut, and I didn’t find “natural breaks.” Using “half taken from the upper middle” as my target, I eliminated the most widely subscribed 90 and least widely subscribed 183 from the original 554, leaving 281 blogs with 19 to 196 Bloglines subscriptions. Note that only 25 of last year’s candidates had more than 196 Bloglines subscriptions. Unquestionably, liblogs across a broad range have become more popular.
I wanted to cut that list a little more, but I needed more than Bloglines. I did the same set of “reach” measures as in 2005, with one minor tweak and one significant addition. Bloglines OPML output translates directly into a spreadsheet that made it easy to search for links: Highlight the URL cell, copy, paste into the “link:” search, and go. As with last year, I checked link: counts in Google and MSN Search—but this time I used Yahoo! instead of AllTheWeb. I then added one figure that I believe is more meaningful than any of these three: the visible result from Yahoo!
What’s the visible result? The number of sites Yahoo! shows you with its “very similar” algorithm active. Anyone who’s spent time looking at web search engines knows that any result count greater than 1,000 represents a claim, because the search engines won’t show you more than 1,000 results. In practice, deduped results usually aren’t anywhere near the 1,000-result limit. Yahoo! will show 100 results per page and give an accurate count of results displayed on the last page it shows; it also offers larger link: results than the other two engines. That made it an obvious choice. The deduped number is nice because it reduces the echo-chamber effect of blogrolls, where the presence of a blog on another blog’s blogroll may result in hundreds of apparent links, only one of which is significant.
Consider the three raw link: results, noting that I had already removed 90 blogs likely to have very high link: results (and nearly 200 likely to have relatively low link: results):
• Google: The highest number was 5,370 (compared to 9,430 last year); several had no link: results at all.
• MSN: The highest number was 34,669, compared to 76,675 last year; again, several had no link: results.
• Yahoo!: Every candidate had at least five Yahoo! links; the high was 179,000 (compared to 449,000 last year).
These numbers don’t mean much of anything, particularly given the skew of blogrolls. What can you do with ratios of 2,600:1 (Yahoo!) even after you’ve eliminated extremes? I’d concluded that last year’s Reach numbers weren’t very good. Using that same formula yielded a smaller range this year—from a high of 13,497 to a low of 84, a ratio of 161:1, considerably smaller than last year’s 7,778:1 but still too broad given that this year’s 161:1 omits the liblogs likely to have the highest figures. A slightly modified version of last year’s formula, using adjusted deflators, yielded a range of 10,590 to 82, a ratio of 128:1.
The “visible Yahoo!” number had a good feel to it (and an upper limit of 1,000). I calculated a new Reach factor, adding the visible Yahoo! count to twice the Bloglines subscriber total. That yielded a high of 1,387 and a low of 51—a ratio of 27:1.
This time, there were obvious outliers. By dropping nine blogs with new Reach factors above 700 and 21 below 70, I had a ratio of only 10:1 for a candidate pool of 251 blogs.
There’s some indication that Bloglines subscriptions account for about half of all aggregation. If that’s true, the final candidate pool includes blogs with roughly 40 to 400 readers through feeds—and maybe an equal number of direct readers (only the bloggers would know!). That’s a good “medium-sized” region for library blogs—enough readers to be interesting, but not a mass readership even within librarianship.
The next step was metrics and individual examination. Last year, I did detailed work on 60 blogs; this year, I dealt with more than four times as many. Somehow, it all worked.
In the process of running metrics, I removed 38 blogs for various reasons. Some turned out to be dead (I’d missed them in the first round), with no posts after February 2006. Some were official blogs and a couple had no real relationship to libraries or librarians. A few began in June 2006, making them too new for this study—and a couple had posts before March 2006 and posts after May 2006, but none during that quarter. Two distinctly worthwhile blogs—MaisonBisson.com and rawbrick.net—are set up in such a way that I found it impossible to take any metrics. I reluctantly deleted these two from the pool, but people looking for new blogs in this essay should definitely consider those two as well.
The final pool represented in this article is 213 blogs—almost as many as in Pew’s latest study of bloggers. The modified Reach numbers narrowed during the metrics pass. Although the Bloglines range stayed 19 to 196, the modified Reach ranged from a low of 72 to a high of 688, for a range of 9.56:1. As to the metrics used:
• I dropped a couple of last year’s metrics. I’m so discouraged about popularity contests and echo-chamber effects that I didn’t even look at BlogPulse or Technorati or similar sites. I also didn’t worry about the number of link-based postings or the size of the blogroll.
• I recorded the highest number of comments for any single post (and adjusted one blog’s results for a serious spamment problem) I also recorded the post title for that post (if the blog had comments) and for the “first March post,” where “first” depends on how archives are arranged.
• In a second pass, I looked at the number of direct and indirect blogrolls (“indirect blogrolls” being links to other pages or to Bloglines), whether the typeface for blog entries was sans serif or serif and whether text was justified, what program was used for the blog, whether the blog has a Creative Commons license, and unusual color combinations. Those results are summarized below.
I don’t use Reach at all in these metrics, whether last year’s Reach or the new, more plausible, Reach factor. I also don’t use the Bloglines subscriber count. If you wish to explore those metrics for the 213 blogs in the final group, the spreadsheet is available at http://waltcrawford.name/logs6reach.xls. That’s the last time Reach will be mentioned in this article. Individual blogs are discussed in alphabetic order.
What we have here are 213 liblogs from a population of around 550 active liblogs represented in the directories and wikis I looked at. I believe the set is broadly representative of Great Middle liblogs. This is a big bunch of liblogs, almost as many as the total possible candidates a year ago.
It’s also a good bunch of liblogs. Sturgeon’s Law simply does not apply to liblogs in the Great Middle. Sure, a few liblogs are badly written—but not that many. Sure, a few are primarily personal—but “personal” blogs can and do become professional blogs overnight, and I found few that are only personal. (Not that there’s anything wrong with personality or personal blogs; I agree with Steven Cohen that it’s good to see the personality and, as they find appropriate, personal life of a blogger represented alongside other concerns.) A handful of blogs here seem primarily concerned with right-wing political slants on issues—and are a handful color everything with left-wing politics. There’s writing some people might consider offensive, and in at least one case I found that writing revealing, thoughtful, and concerned.
I was deeply offended personally by one (and only one) post within this group of blogs during this March-May period, as the blogger dismissed 27 years of my life and the creative work of several dozen colleagues with a flip sentence or two. That blogger appears to be young; with luck, he’ll grow up. (Of course it was a he.) I found many bloggers with whom I’d disagree on some issues. Not incidentally, I’ll be slow to remove those bloggers from my Bloglines list, because they make me think.
If these liblogs are any indication—how can they not be?—library people who blog are mostly thoughtful, intelligent and caring. Not always, not equally, but on the whole it’s a safe bet. I started out with too many liblogs in Bloglines. I now have almost two hundred more—and, other than the ones I can’t read, it will be difficult to reduce that list. It may be easier to off some of the big names…or not.
Here’s the new set as reflected in the final 213:
• Starting date (from internal evidence): None of this year’s candidates started before 2001, and only two began in 2001. Fourteen began in 2002; 41 in 2003; 51 in 2004; 47 in the first half of 2005; 35 in the second half of 2005; and 23 in the first half of 2006. The median is December 2004—as of the end of June 2006, half the blogs are at least 18 months old, half younger.
• Frequency of posts, March-May 2006: The most prolific blog had 371 posts during the three-month period. Several had only one post. The average was 45 posts, but the median was 27.
• Number of comments, March-May 2006: 41 of the blogs either don’t allow comments or had none during this period. The highest comment count (excluding spam) is 798. This year’s average is 35 comments, but the median is only 10.
• Comments per post: I saw no reason to exclude zero-comment cases. The high this year is 9.63 comments per post. The average is 0.93 comments per post; the median, 0.42.
• Total length of posts, March-May 2006: I was unable to determine the total length on 15 of the blogs because of the way they’re archived or stored. Among the 198 where this metric and the next were feasible, the average is 11,412 words. The median is 5,843 words. Consider the standouts: Two blogs nearly tied at 144,809 and 144,504 words each!
• Average length of posts: The average post this year (the average of all averages) is 268 words long; the mean, 225. This year’s most essay-oriented blog averages 1,463 words per post. At the other extreme, three bloggers averaged fewer than 51 words per post.
• Blogrolls: 111 have blogrolls of some sort (not always with that name) on the home page (52%); another 18 (8%) have indirect blogrolls (usually a link to a Bloglines subscription). 40% do not have blogrolls.
• Typeface and alignment: 136 (64%) use sans serif type set left-aligned. Fifty-four (25%) use serif type, left aligned. Twenty-two (10%) use sans serif but with justified type. One blog uses serif text, justified.
• Color usage: Most blogs—65%—use black text on white or near-white backgrounds. Forty-seven use colored backgrounds that are light enough not to impair readability very much (although some people may find polka-dot backgrounds unsettling). Fourteen (6.6%) use dark colored backgrounds (six of them black), usually with white type, which may be trendy but encourages the reader to stick strictly with feeds.
• Creative Commons: 49 blogs (23%) have CC logos on the home page yielding some copyright. I didn’t check the actual license in each case, but I believe BY-NC and BY-NC-SA to be most common.
• Software: 106 (50%) of the blogs use Blogger, almost all of them freely hosted on Blogspot. WordPress comes in a strong second, with 52 blogs (24%). Six Apart products come in third and fourth: 19 Movable Type (9%) and 12 TypePad (6%). In 26 cases, the software was either a lesser-known product or not identified on the home page.
These comparisons may not make much sense. After all, last year’s investigation looked toward blogs with the broadest readership; this year’s looks at a different slice (with some overlap).
• Three of last year’s blogs are older than any of this year’s. Almost half of last year’s blogs began in 2003; this year’s group skews younger.
• Last year’s blogs typically had a lot more posts than this year’s. I believe that represents changing trends in blog authorship as much as it does the different slice of liblogs.
• A higher percentage of this year’s blogs allows comments—and five of this year’s blogs have more comments than the highest of last year’s. On the other hand, the average and median number of comments are both lower this year.
• The most conversational blogs this year are much more conversational than last year’s peak—and the average comments per post is a little higher.
• The average total length of posts during one quarter is a little lower this year. On the other hand, four of this year’s blogs are longer in total than last year’s longest; two are more than twice as long.
• On average, posts are a little longer this year than last and the median is significantly higher.
To keep these tables to reasonable lengths, I’ve defined “standout” more strictly than in 2005. None of these measures means anything about a blog’s quality or significance, but they do represent significant deviations from the norm.
These blogs were at least three years old when the study began.
2001: March, Random Access Mazar; October, The Rabid Librarian’s Ravings in the Wind.
2002: January, EngLib; March, Wigblog; April, Biblog - Bibliotek og IT, diglet; May, Helenes hengekøye; July, Lady Crumpet’s Armoire; August, The Aardvark Speaks, etc., Internetsøgning; October, blogdriverswaltz.com, Confessions of a Science Librarian, indie rock librarian; November, Retrofitted Librarian; December, DrWeb’s Domain.
2003: January, At Home He’s a Tourist, Book Kitten, Sites and Soundbytes, That Rabbit Girl; February, Archivalia, Chronicles of Bean, internetbrus.com, Max Power Blogs, The Misadventures of Super_Librarian, Pattern Recognition; March, STLQ, TangognaT; April, Bibliotekarens bibliotek, Creative Librarian, infosophy: The Playful Antiquarian, Socio-technological Rendering of Information, UK Freedom of Information Act (FOIA) Blog; May, Library Monk, Mermaid, Ref Grunt; June: DIY Librarian, Librarian, Library Autonomous Zone, Mentat, nichole’s auxiliary storage, The Ten Thousand Year Blog.
These blogs have more than twice as many posts as the average for all 213 blogs (and more than three times as many as the median).
|
A Fuse #8 Production |
371 |
|
Slaw |
323 |
|
medinfo weblog |
240 |
|
The Rabid Librarian’s Ravings in the Wind |
236 |
|
Travelin’ Librarian |
235 |
|
Out of the Jungle |
225 |
|
Mermaid |
195 |
|
Archivalia |
178 |
|
Kids Lit |
170 |
|
DrWeb’s Domain |
163 |
|
Sites and Soundbytes |
161 |
|
Library Mistress’s Place |
147 |
|
Text & Blog |
145 |
|
affordance.info |
122 |
|
Library Boy |
121 |
|
Digitization 101 |
117 |
|
UK Freedom of Information Act (FOIA) Blog |
113 |
|
Information Overlord |
112 |
|
The Gypsy Librarian |
107 |
|
Baby Boomer Librarian |
102 |
|
A Wandering Eyre |
98 |
|
OUseful Info |
96 |
|
Professional-Lurker: Comments by an academic in cyberspace |
95 |
|
ricklibrarian |
94 |
|
Wouter over het Web! |
94 |
|
Game On: Games in Libraries |
92 |
|
OPL Plus (not just for OPLs anymore) |
92 |
|
e-klumme |
91 |
These blogs have at least twice as many comments as average (noting that the average includes blogs with no comments), or seven times the median.
|
A Fuse #8 Production |
798 |
|
The Zenformation Professional |
597 |
|
Mermaid |
422 |
|
Slaw |
410 |
|
Ruminations |
257 |
|
...the thoughts are broken... |
200 |
|
TangognaT |
183 |
|
The Misadventures of Super_Librarian |
166 |
|
The Aardvark Speaks |
146 |
|
Chez Shoes |
131 |
|
A Wandering Eyre |
128 |
|
Tinfoil + Raccoon |
125 |
|
The Vampire Librarian |
124 |
|
Text & Blog |
119 |
|
Tales from the “Liberry” |
117 |
|
indie rock librarian |
108 |
|
Larocque and Roll |
107 |
|
Wouter over het Web! |
103 |
|
See Also |
99 |
|
affordance.info |
98 |
|
Travelin’ Librarian |
97 |
|
The Illustrated Librarian |
88 |
|
Library TechBytes |
78 |
|
Lady Crumpet’s Armoire |
75 |
|
ISHUSH |
72 |
|
Libraries in the NHS |
72 |
|
T. Scott |
70 |
These blogs have the most comments per post, at least twice the overall average (and almost five times the overall median).
|
The Zenformation Professional |
9.63 |
|
The Gay Librarian |
8.50 |
|
The Vampire Librarian |
7.29 |
|
InfoTangle |
6.33 |
|
Library Bitch |
6.00 |
|
The Green Kangaroo |
5.20 |
|
digitize everything |
5.00 |
|
Quædam cuiusdam |
3.83 |
|
The Misadventures of Super_Librarian |
3.46 |
|
Ruminations |
3.38 |
|
indie rock librarian |
3.38 |
|
The Illustrated Librarian |
3.26 |
|
The Aardvark Speaks |
3.04 |
|
Larocque and Roll |
2.97 |
|
Tinfoil + Raccoon |
2.72 |
|
etc. |
2.55 |
|
Librarian 1.5 |
2.55 |
|
Random Access Mazar |
2.55 |
|
Assemble Me |
2.50 |
|
bitter librarian |
2.50 |
|
REAL PUBLIC LIBRARIAN |
2.50 |
|
Chez Shoes |
2.47 |
|
...the thoughts are broken... |
2.41 |
|
See Also |
2.36 |
|
TangognaT |
2.26 |
|
T. Scott |
2.26 |
|
Mermaid |
2.16 |
|
A Fuse #8 Production |
2.15 |
|
Tales from the “Liberry” |
2.09 |
|
Lady Crumpet’s Armoire |
2.03 |
|
aleah marie |
2.00 |
|
At Home He’s a Tourist |
2.00 |
|
bloggrik< |