On Wikis and Transparency
Transparency is a Very Good Thing.
Except, of course, when it isn’t.
As with most good things, there are times when transparency isn’t appropriate—and other times when we may not be aware just how transparent we are.
Obvious cases: Personnel and awards. Those are the only areas where ALA allows closed meetings: When personnel issues are being discussed and when awards are being judged.
Less obvious cases: When it’s premature to reveal everything about a situation.
Sometimes, that’s a competitive issue. I wouldn’t expect Honda or GM to have webcams in their design studios, showing us what they’re thinking about for future models. I don’t believe Google opens the doors of all its offices at all times to everyone who’d like to hear what researchers are working on for changes in their ranking algorithms or for new services.
At other times, it’s an issue of maturity. Too much transparency on a new service may get in the way of its becoming successful.
While this isn’t a lengthy discourse on transparency in general, I’ll offer an example or two:
Ø I would love to know how many Kindles were sold in that vaunted 5.5 hours between it going on sale and the first production run selling out—but for Amazon to reveal sales figures at that point could have damaged Amazon’s efforts to make Kindle succeed. (Five months later, I believe it’s legitimate to wonder why there are no sales figures.)
Ø If a reliable survey of library chat reference service had been done a month after such services were introduced, with 100% response rate, the numbers might have convinced other libraries that chat reference was a non-starter. That would (I believe) have been a mistake. It was too soon to dig into the numbers. A year after introduction, there should be useful numbers.
I’m not arguing against transparency in general. The first sentence in this essay is not there for ironic value. As a rule, I believe in transparency—but not always. (Nor do I believe anyone’s proposing universal transparency under all circumstances.)
Life gets interesting when we’re more transparent than we expected. In face-to-face situations, that’s not unusual: Body language frequently betrays what we feel, independent of what’s coming out of our mouths.
I suspect we usually think we have some measure of control in other situations—indeed, things may be less transparent than we’d like. For example, I have no real idea how many people read Walt at random—and I’ll suggest that, given aggregation, most bloggers who provide full-text feeds don’t have much idea how often their posts are being read. More to the point, you have very little idea how many people read Walt at random and even less idea how many people read Cites & Insights—unless I make a point of telling you and you believe what I say. For blogs, you can get a vague sense of some other blog’s overall readership within the two most popular aggregators, Bloglines and Google Reader: Both services will report numbers, at least to one feed in each case and at least if you’re willing to subscribe to the blog temporarily. But that’s a vague sense at best. It overcounts thanks to abandoned and unread subscriptions and undercounts thanks to other aggregators and direct readership. For other websites, all measures are wildly approximate and indirect, including Google PageRank, pop.urio.us and Alexa.
On the other hand, those indirect measures and other web resources may make some situations a lot more transparent than we’d really like. Sure, Worldcat.org doesn’t include all libraries—but when a supposedly best-selling book only shows up at six libraries on Worldcat.org a year after publication, you’d have to have severe nasal congestion not to smell something funny about the “best selling” claim. That’s just one example; there are many others.
The situation considered in the rest of this essay is different. One class of software that’s useful for many libraries—in particular the most commonly-used example in that class—defaults to remarkable transparency. You can control that transparency, but you may not want to. I will argue that you shouldn’t reduce the transparency—but that you should be aware of the transparency.
Wikis can have many uses within libraries—not just the huge wikis such as Wikipedia but also library-hosted wikis. Most library wikis use the same wiki software as Wikipedia and Citizendium: MediaWiki, created to support Wikipedia and issued as open source software, free for the taking.
MediaWiki is a good choice. It obviously scales well. Your library wiki probably won’t have two million articles or be edited by tens of thousands of people. It has lots of extensions for those who need more than the standard features—and those standard features are fairly extended. The markup language is no worse than most other wikitext systems and deviates from the norm primarily in one respect, one where I regard MediaWiki’s choice as superior. Namely, you don’t link to or create new pages by using CamelCase (words and phrases without internal spaces but with internal capitals); instead, you use explicit markup for links.
One of MediaWiki’s strengths is also, potentially, a weakness: It is extremely transparent, at least in a standard install. Anyone with ordinary read access to a typical MediaWiki can find out a lot about how that wiki is being used—perhaps more than you’d like them to know.
One mark of a standard install is the left-hand boxes: typically three or more boxed areas to the left of article text, one marked “navigation,” one marked “search,” one marked “toolbox.” Sometimes there are more boxes or a slightly different design.
“Navigation” includes one clear piece of high transparency (sometimes moved to “interaction”): Recent changes. By default, it brings up the most recent 50 changes over the most recent week—and you can adjust those to track up to 30 days and up to 500 changes. (For wikis with multiple namespaces, you can usually choose which namespace you want. You can typically also hide certain categories of changes—and, crucially for a wiki’s manager/editor, you can hide your own changes. Part of my job as Managing Editor of PALINET Leadership Network is checking Recent changes every weekday—hiding my own changes and seeing what else has happened.) Glancing at “Recent changes” for a wiki can hint at several things:
Ø If a week’s worth of changes is empty or has only one or two items, and that doesn’t change much when you go to 30 days, that means the wiki isn’t being actively edited. That may not be a bad thing, depending on the nature and intent of the wiki, but it’s an interesting thing that you wouldn’t always know about most writable websites.
Ø If 50 changes only go back for an hour or two, you know you’re at a lively site—unless you see that all those changes are from the same user or they all seem to involve deleting material or undoing other changes. Checking Wikipedia at 3:15 p.m. (PDT) on a Friday afternoon, the first 50 changes go back all of two minutes—as do the first 200 changes. I don’t think you’ll find anything like that anywhere else. Even at Wiktionary, 50 changes go back less than half an hour. At many active sites, 50 changes will take you back at least a day or two.
Ø Who’s making the changes? If it’s all one or two names, that tells you something about the wiki as a collaborative writing project, although nothing about its worth. If it’s 20 names for 50 changes, there’s a lot of collaboration.
Ø You might glance at the nature of the changes. If you see lots of cases with an IP address instead of a user and a parenthetical number in the thousands (e.g., “(+9,375)”), and a little more recently you see a named user and a negative number exactly matching the other, you’re seeing spam and spamfighting—an anonymous idiot (or bot) adding huge numbers of links, some alert editor or user reversing the change. If you wonder why more and more wikis require some level of authentication for editing, wonder no more: Any reasonably popular wiki runs into spam problems, and many wiki owners can’t afford to keep monitoring and reversing the problems.
OK, so Recent changes tells the observer something about a wiki—but only about its editorial activity, which isn’t always important, depending on the nature of the wiki.
More extensive transparency lurks behind this innocent link, usually in the toolbox: Special pages.
How much can you find out about a wiki? More than you might expect. Here are three examples. I’m not going to name them. In one case, the wiki’s too new (and promising) for such scrutiny and in all cases it’s not relevant to this discussion. The page within the Special pages list appears in bold—noting that there are a lot more pages in most Special pages lists.
Statistics: The total of page views for the wiki is just over 16,000—an average of 13 page views per edit. The most viewed page (other than home and administrative pages) was viewed nearly 500 times, which isn’t bad for a very young wiki.
Orphaned pages: More than 50 pages don’t have any links from any other pages, which suggests that interlinks aren’t a primary means of navigation or that quite a few pages haven’t become part of the whole. The empty Categories page shows another typical means of navigation that this wiki doesn’t use—which only leaves searching and the table of contents on the main page. (Dead-end pages takes a different view: Pages that don’t link anywhere else. There are even more dead-end pages, around 100, but that makes sense given the nature of this wiki.)
New pages: Another indication of activity—and in this case it’s a strong indication that the wiki’s being developed actively, as 50 new pages go back less than a month. Meanwhile, Oldest pages usually offers a good indication of the age of the wiki—in this case, four months.
While the Statistics page includes pageview counts for the ten most popular pages, Popular pages offers a sense of how diffuse usage is—that is, whether there are a lot of pages with reasonably high pageviews. For a very young, fairly specialized wiki, a cutoff of 50 views might make sense—and Popular pages shows that three dozen pages have at least 50 views. There isn’t an “Unpopular pages” but you can pull up bigger sets or additional sets—in this case getting to two pages with two views, and one that’s never been viewed at all. Comparing Popular pages with the page count on Statistics shows one oddity of MediaWiki counts: Some page categories aren’t included in Popular pages. Thus, in this case, the least popular page is #115—leaving forty mystery pages.
There’s a lot more. Articles with the most revisions provides insight into where the most collaboration is happening—although, without double-checking history and talk/discussion pages, it’s hard to be sure just what that means. There’s also Articles with the fewest revisions, in this case showing a fair number of “stable” or relatively non-collaborative pages—more than 20 pages with two revisions each.
Two oddities that the curious may find interesting: Long pages and Short pages. This wiki has a handful of fairly long pages (from 7,000 to 8,500 words, assuming six characters per word)—but even more pages that have been identified but have no content (20 pages with 0 bytes each).
Finally for this bit of snooping, and ignoring more than 50 other special pages, there’s All pages—which lacks a counter but shows you alphabetic lists for each namespace. What’s a namespace? A specific kind of page, typically indicated by a prefix in the pagename. For most articles in most wikis, (Main) is the namespace (and there is no prefix). But there’s also a Talk namespace (the talk or discussion pages that appear with each article—but no Talk page will be listed unless there has been text on the page), Help and Help Talk namespaces, User and User Talk—and frequently more. (For example, the PALINET Leadership Network has an Essay namespace for third-party content that’s more protected than other pages and can only be viewed by registered users—and, to be sure, there’s an Essay Talk namespace to match.)
What can we learn from All pages in this case? Eight users have text on their pages, so we can read a little about them. Five regular pages have Talk pages, frequently worth investigating in an unfamiliar wiki.
You get a fair indication of the level and kind of activity in this blog by looking at a handful of pages—and, for a typical MediaWiki install, anyone can look at those pages. Suspect that a wiki has gone dormant, not only in changes but in readership? Check Statistics one day and print it out or jot down some numbers—then check it again a week or a month later. Do be aware that your observations change reality: Every page you look at is a pageview. In this case, a recheck a week after the first draft of this article shows quite the opposite: Overall pageviews more than doubled and all other indications show a rapidly-growing wiki with significant use.
Here’s another, very different example, one that’s been around for a few years and serves a relatively small, specialized audience (but is open to anyone). You already know most of the pages I’m looking at. What can we find out about this wiki?
It’s being edited, but not heavily: 50 entries go back a week and involve only three users. Overall usage is impressive for a specialized wiki, with more than 1.2 million pageviews. As it happens, this is a wiki I’d looked at two months previously—and that makes the pageviews even more impressive, as it comes out to 175,000 pageviews in two months. At more than 48 pageviews per edit, this is a wiki used for reading more than writing. The claim is that just over 1,100 pages out of 3,500 are “legitimate content,” and that may be right in this case. Two content pages show more than 20,000 views and the 10th most viewed page is still well above 8,000—which is very good given the limited audience.
There are a lot of orphaned pages—more than 1,000—including a few that are spam and many that are supposedly visible only to special users. (That’s not true: They show up from Orphaned pages, which means this wiki may be more transparent than its managers intend. For that matter, they also show up when reached from the “Restricted” category page.)
This wiki does use categories, and there are a lot fewer Uncategorized pages than orphaned pages, so categories are a strong navigation tool (but not part of the leftside toolboxes). There are more than 1,000 dead-end pages, many of which also appear to be orphaned pages: Pages stored for convenience but not intended to be part of the main wiki navigation.
The wiki’s been around for a while and is relatively stable in terms of topics: only two new pages were added in the last month. Oldest pages suggests that the wiki started in June 2004 (with a trial entry somewhat earlier).
As for breadth of use, it’s impressive. Nearly sixty pages have been viewed more than 2,000 times; another 90 have more than 1,000 pageviews; and more than 360 pages have at least 500 views—this in a wiki with a narrow focus and a narrow audience. (More than 1,100 pages have more than 100 views!)
What else? A handful of pages have been frequently revised (21 with more than 100 revisions) while a lot of pages haven’t involved much collaboration (more than 100 pages with two revisions and another 100+ with three). Two oddities: There are a few dozen “double redirects,” where a page has been renamed more than once, and more than 20 “broken redirects”—redirects that link to nonexistent pages.
What about extremes of length? Five pages have more than 100,000 characters (more than 16,000 words) and nearly two dozen exceed 42,000 characters (7,000 words)—the point at which MediaWiki sometimes complains about editability. Fewer than 10 pages have no content at all, but some fifty are short enough to suggest that they’re test pages. There’s nothing noteworthy in terms of namespaces.
All in all? The picture of an established specialized wiki that continues to be actively used across a broad range of content. The owners may not be aware that the “restricted” pages aren’t really restricted, but that’s about the only negative comment I can offer.
This one theoretically serves many institutions but with a relatively narrow focus—and it’s another one I’d looked at two months ago, allowing me to see how active it is currently. The wiki’s three years old and has just over three-quarters of a million pageviews—including just over 100,000 in the last two months, which is healthy activity. About 10% of all pages appear to be content pages—something over 100.
Two dozen pages don’t have links from other pages and six dozen are dead-end. While categories are definitely used, more than 150 pages lack categories—but checking a sampling of those showed strong linkage in most cases.
The wiki isn’t getting many new pages: Three in the last five months. Neither is it heavily collaborative at the moment: All edits over the last week were either spam or reversion of spam. Looking at old pages marks the start of this wiki in March 2005—with a lot of pages added in the first few months.
Breadth of use? Quite good. A fair number in excess of 10,000 views; a lot with more than 2,000 (more than 60, with another 50-odd exceeding 1,000); and nearly all of the pages that show up in this list (which typically excludes most special categories and namespaces) have more than 500 views—just over 160 out of a total 198. Basically, whatever’s there is being viewed frequently.
Some typical special pages don’t show up on this install; I can’t tell you which pages are most or least frequently revised or whether there are any double or broken redirects. On the other hand, Long pages and Short pages are here but undramatic. No page exceeds 20,000 characters (roughly 3,500 words) and only a dozen are much more than 1,000 words; there’s one empty page but only a couple more so short to be accidental or quick definitions.
All pages shows rather a lot of Talk pages relative to the total number of articles—which usually means one of two things: The wiki has a lot of real conversation or there’s a spam problem. Clicking through to a sampling suggests that both are true—and the number of empty but created Talk pages says there’s an ongoing effort to battle spam.
If you or your institution has a wiki, particularly a MediaWiki wiki, I am not suggesting that you panic or find ways to lock things down. I believe most of this transparency is all to the good in most situations—as long as you’re aware of it.
I wouldn’t store sensitive information on supposedly restricted pages unless you’re sure they’re restricted. I wouldn’t make claims about the activity on your wiki unless internal evidence backs up those claims. Evidence from log analysis packages may be misleading (for reasons too peculiar to mention). If the log analysis package says you’ve had 85,000 pageviews and the Statistics page shows 70,000—well, I know which one I’d believe.
Sure, you can make your wiki more opaque. You can use a different wiki package. Of those I’ve observed, most seem to offer a lot less information to outsiders than MediaWiki does. Or you can modify MediaWiki to be less transparent: It’s open source software, after all.
If you look at wikindex (www.wikindex.com), “the index for wiki sites,” the MediaWiki section lists more than 3,000 MediaWiki wikis ranked by some combination of usage, size, users and updates. (It also shows some sets of wikis using other software, but none of them have ranks at this writing.) Consider the highest-ranked wikis that aren’t various Wikipedias (that is, English, German, French, Italian, Japanese, Polish, Swedish: Seven of the eight highest-ranked wikis). Wikipedia shows most special pages, although pageviews don’t appear, at least on English Wikipedia. What about some other “popular” wikis (many of which appear to come from Wikia, Jimmy Wales’ for-profit operation)?
Ø Wikimedia Commons and Wiktionary use what appears to be the same modified set of special pages as Wikipedia.
Ø Wookieepedia, the Star Wars wiki (I kid you not: it ranks 10th) includes almost all of the special pages—but omits Popular pages and omits pageviews in Statistics.
Ø wikiHow (#11), on the other hand, is transparent—the appearance is heavily modified (and the statistics page, for one, far more attractive than most), but as of early May 2008, it tells me there have been more than 281 million pageviews (more than 138 per edit) and that the most popular how-to page (“French Kiss”) has 1.8 million views—and Popular pages is there, showing 32 pages with at least half a million views (the “least popular” of those being “Make Jello Shots,” but #49 at 420K views is much more essential: “Calculate Pi by Throwing Frozen Hot Dogs”).
Ø WeRelate (#12), a genealogy wiki, has Special pages—but you have to look a little. Once there, you find most special pages but not Popular pages, and overall page views but not the most widely-viewed pages. Understanding that wikindex’ ranking isn’t entirely based on pageviews, it’s interesting that this wiki has had fewer than 7.7 million pageviews—not quite 3% of wikiHow. It’s very active—there are only 1.3 pageviews per edit! (It’s also apparently only two years old, and phenomenally active for such a young wiki.)
Ø Wookieepedia comes from Wikia, which apparently sets up lots of pop-culture wikis heavily laden with ads. #13 is Halopedia, “the definitive source for Halo information.” Special pages is in small type but it’s there—and although Popular pages is missing, the Statistics page does show pageviews (297,000, less than one per edit—and, astonishingly, less than one for every two registered users, suggesting that the counters have been reset at some point). I do note that individual pages don’t show pageviews or most recent edit at the bottom, so the transparency’s been clouded a bit.
Ø On the other hand, #14—WoWWiki, another Wikia wiki devoted to World of Warcraft, does have Popular pages (renamed “Most popular pages”) and the statistics page shows pageviews—but with two oddities: It shows 481.8 million pageviews and says counting has been disabled since mid-2007. And yet, the most popular articles apparently come directly from the database, and #1 has fewer than 5,000 views, with only seven more exceeding 1,000. Clearly, there’s a disconnect here.
Ø A little further down, the humorous Uncyclopedia (#18, which astonishingly, has more than 23,000 articles) doesn’t show overall pageviews—and while there’s a Most popular articles page, it shows no results. Since articles also don’t show pageviews, it’s anyone’s guess as to how often people actually look at Uncyclopedia (yes, it’s another Wikia wiki).
You can make your wiki more opaque—but why bother? I regard MediaWiki’s transparency as a strength, not a weakness. Better to spend your time establishing a user setup methodology that reduces spam as a problem.
With a little awareness, wiki transparency is a good thing. If you’re wondering: To the best of my knowledge, all the special pages are available for PALINET Leadership Network. I can’t think of many library-related wikis that shouldn’t operate with reasonably full transparency. Just be aware that your wiki really is transparent.
Cites & Insights is sponsored by YBP Library Services, http://www.ybp.com.
Opinions herein may not represent those of PALINET or YBP Library Services.
Comments should be sent to firstname.lastname@example.org. Cites & Insights: Crawford at Large is copyright © 2008 by Walt Crawford: Some rights reserved.
All original material in this work is licensed under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/1.0 or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.