Cites & Insights: Crawford at Large
ISSN 1534-0937
Libraries · Policy · Technology · Media

Selection from Cites & Insights 11, Number 9: October 2011

T&QT Retrospective

Far-Away Services with Strange Sounding Names

Remember Cuil? A little more than three years ago, it was all the rage—a new search engine developed by ex-Googlers using “a form of data mining to group Web pages by content.” Cuil started up on July 28, 2008, claiming to have a larger index than any other search engine—120 billion web pages at the time. The company was hot stuff: It raised $33 million in venture capital.

Back then, I printed out leadsheets from interesting discussions of Cuil, but somehow never got around to putting them together or discarding them. Looking at them now—eight of them—I see just to what extent Cuil was a two-day wonder: Five of the items are from July 28, 2008; two are from July 29; and one laggard item is from August 1, 2008. It turns out I also tagged one item on April 14, 2010. What did library folk and a few others have to say about this wonderful new search engine at the time?

Now what’s cooler than being Cuil?

That was Chris Zammarelli at Libraryola on July 28, 2008—and I can’t provide a link because Libraryola has gone the way of Cuil, although without burning through $33 million. Zammarelli did an ego search on Cuil, with pretty dismal results.

Of the first 11 results displayed:

Four of the results were dead links;

Two of the results were the same link;

Four of the results were results older than January 2008;

Two of the results displayed photos that were irrelevant to the links they were attached to.

There’s more—but I can’t discuss it, since I only printed the first page. (Libraryola is still around—but now it’s all in some Cyrillic language, other than ads, and translating the first couple of paragraphs suggests that it’s a typical ad landing page.)

Cuil Launches—Can This Search Start-Up Really Best Google?

Danny Sullivan posted this on July 28, 2008 at search engine land. That’s the natural question, especially for Sullivan’s site.

Can any start-up search engine “be the next Google?” Many have wondered this, and today’s launch of Cuil (pronounced “cool”) may provide the best test case since Google itself overtook more established search engines. Cuil provides what appears to be a comprehensive index of the web, offers a unique display presentation, and emerges at a time when people might be ready to embrace a quality “underdog” service.

It’s a thorough discussion, noting Cuil’s “impressive pedigree” of founders, listing the four major areas it claimed to distinguish itself (big web index, unique relevance algorithm, unique results display, privacy) and discussing each of those.

Given that Google and Bing each now probably have many times the indexed pages that Cuil had—and that neither one mentions the index size—it’s interesting to get Sullivan’s immediate response to Cuil’s claim to index three times as many pages as Google:

Sigh. Yes, size matters. You want to have a comprehensive collection of documents from across the web. But having a lot of documents doesn’t mean you are most relevant.

That’s followed by a lengthy self-quote from September 2005 (when Google stopped mentioning its size). He found the whole discussion of size disheartening and pointless. Sullivan also pokes at the improved-relevance claim at some length, noting that Cuil seemed to be using popularity despite its claims to do otherwise.

The display difference—well, if you’re one of those who likes multicolumn sets of paragraphs rather than a nice column of results, you would have loved Cuil. Oh, and Cuil suggested search topics as you typed—which some of us still don’t much care for. Finally, Cuil claimed it wasn’t logging IP information on searches. Sullivan didn’t seem to think this mattered.

The final section of a long discussion (one that sometimes feels like an apologia for Google) is “Will Cuil Succeed?” Briefly, Sullivan thinks it could “pick up a little share, maybe a point or two,” but that it was unlikely to be a Google-beater, or even a Microsoft- or Yahoo!-beater.

Not so Cuil

That headline was used a lot on and after July 28, 2008, but in this case I’m looking at Doug Johnson’s post at The Blue Skunk Blog. Johnson did the same thing as Zammarelli—well, wouldn’t you? He ran an ego search. Of course, “Doug Johnson” isn’t the most unusual name in the world. He found the first page of results “let’s say, interesting.” I see a magazine-format page with 11 items. The first is a Wikipedia article on Doug Johnson, keyboardist for Loverboy. The second, third, fifth, and ninth are about the library Johnson. Others are for various sports-related Johnsons and one media person—and, last on the page, an odd price-comparison link. Johnson’s comments?

While I did like the Lover Boy implication and that 3 of the first 10 results were related to me, none was a direct link to either my blog or website. And the pictures are a mess. Who are these people? Not me. The little graphic of the bottle comes from my column on the Education World website but is placed next to the hit on Wikipedia that lists other Doug Johnsons. (Yes, there are quite a number of us out there.)

While one of them looks like a direct link to his website, I’ll take his word for the picture mess, especially since Zammarelli found the same problem. Johnson offers a screen shot from the same search done on Google; that one has his website first, his blog second. After that come sports figures and others. No photos and much briefer results.

Johnson doesn’t really offer a critique, other than the picture problem.

Cuil

Terry Ballard kept the title simple for this July 28, 2008 post at Librarian on the edge. Ballard wanted to see some serious competition for Google:

It's always been my fondest hope that someone would come along and give the Big G a real taste of competition. I don't have anything against Google - I just think that competition will help bring out the best in them. Naturally, when I heard about this on the morning news, I couldn't wait to try it out.

Of course he tried an ego search—and wasn’t impressed with the results. The drill-down feature on the right side suggested as a subtopic “People from St. Louis,” and Ballard isn’t from St. Louis.

Most amusingly, they add pictures to each page description. In the case of my entries, there are dozens of pictures of somebody else named Terry Ballard. Their formula really should ensure that the picture comes from the page they are describing. Enough other people were interested that their servers were swamped in the afternoon. My verdict is that I love the concept but the product isn't quite ready for prime time.

By now, a theme seems to be emerging: The presentation is interesting (although I’d find it frustrating if I wanted to plow through results)—but you shouldn’t add pictures to every excerpt unless you know enough to add the right pictures.

Wayne Bivens-Tatum used the same title for his own post—a day later, July 29, 2008, at Academic Librarian. After trying a couple of searches, “so far I don’t see why I would use this much.”

I searched “academic librarian,” for example. Of the eleven hits on the first page, four were to this blog. It’s nice to know I have such “authority,” but I thought four was about three too many. Three of the four hits had pictures of people beside them. I have no idea who the people are, but they’re definitely not me. I also searched “bivens-tatum.” The hits are all relevant, and there’s a nice spread, but again the pictures have nothing to do with me.

He also wonders about the relevance ranking:

If the top left hit is the most relevant, then apparently a Shakespeare authorship website I made in library school is the most relevant web page related to me. Maybe they know it’s the first web page I ever created, so it has a certain sentimental value.

This paragraph sums up part of my problem with Cuil’s whole approach:

The layout is presumably to prevent the need to scroll, but I would like an option in the preferences to have more hits on the first page. When I’m looking for information, I want more text, rather than a tastefully arranged page with images scattered across like knick-knacks. I might like the search results better if I wasn’t ego-cuiling, but I don’t think I’d like the layout.

cuil – search the largest web index

That title, on a July 29, 2008 post by Michael at infodoodads, surprises me a little: It takes Cuil’s claim at face value. The writeup notes a “startling black background” for the search-entry page and says that bigger is nice, but “it does little good if the information is poorly matched to the search.”

Michael’s ego search yielded his staff page in the first page of results—but it’s an old staff page, yielding a dead link. He liked the way results are presented and didn’t seem too concerned with the image-match problem, even though he does note that, on a second try, the “thumbnail” for his staff page is “from an image not found on my page.” His conclusion? “Interesting. Give it a shot!”

This is the first of the posts checked that has comments—and the first of those is particularly interesting: From someone named Mike who blogged at Buttermouth, and who admitted to being a “Google enthusiast and loyalist” (really?)—and who clearly doesn’t understand that “it’s” means “it is,” not “belonging to it”—the assertion is that Cuil found the old staff page because it only searches through websites established before June 2007. In the linked post, he calls Cuil a “33 million dollar flop or better yet, the ‘Waterworld’ of online ventures” and flatly says the company “is built on FALSE marketing and inferior results.” He also claims that the index size is a lie, based on a metric that is, in my opinion, nonsense.

Librarians Exploring Cuil

That’s the title for a Daniel A. Freeman post on August 1, 2008 at the ALA TechSource Blog—although it turns out Freeman also posted “A ‘Cuil’ New Way to Search” on July 28, 2008. That first post has an interesting core paragraph, which I’ll quote without comment:

Cuil is of particular interest to librarians because its new features attempt to provide a more nuanced, interactive set of search results. In other words, Cuil tries to emulate the experience of a more professional search, the kind you might get with the assistance of a librarian. For years we’ve been questioning effect of search engines on librarians, and due to some recent events, many of us may be wary of a search engine developing such broad power. Personally, I have trouble seeing the launch of Cuil as a detriment—call me naïve, but I think there will always be a place for reference services. Cuil, like Google before it, will probably just become another tool we can use professionally.

The August 1 post is interesting because of what seems like a defensive attitude:

In the culture of the Internet, the sound byte and 24/7 cable news networks, as soon as something is praised, it gets torn down and trounced. This process has accelerated so quickly that it sometimes seems like the two things are happening simultaneously.

This has definitely been the case with Cuil As soon as Cuil developed a mainstream media buzz, the mainstream media was there to kill the buzz, declaring it “No Threat to Google”. As anyone who watches cable news knows, it can be tough to have a conversation when all you’ve got is two diametrically opposed sides screaming their heads off at one another.

By comparison, Freeman finds librarians’ discussion “a lot more rational and down to earth.” Sure, it’s good that librarians were exploring the service before attacking it out of hand—but the commentaries I saw (and cite above) are negative about Cuil because of the results. And I really do wonder about this final paragraph:

Google is still the unrivaled leader among search engines, and I suspect that probably won’t change for a long time. But is Cuil a big deal? Absolutely. In a time when conglomeration and monopolization limit so many of our choices, Cuil is a reminder that as long as there is freedom of ideas, there will be freedom of choice. It doesn’t matter if Cuil is a threat to Google or not. As the first high-profile effort to try to improve upon Google’s core model, Cuil matters.

It’s hard to remember the state of the art in July 2008, but I thought that both Yahoo and Microsoft (I guess it wasn’t called Bing back then) were challenging Google’s model. I certainly agree that monopolization isn’t great (and wish more librarians would seem concerned about single-supplier futures, rather than welcoming and pushing towards them), and I use Bing as my default search engine.

Cuil CEO Rips Users, Asks Them To Please Shut Up

Now—ignoring hundreds of other items from the second half of 2008—we jump forward to April 14, 2010 and this Michael Arrington piece at TechCrunch. Arrington notes what happened with Cuil: Its early poor performance yielded not only criticism but poor continuing use. Come 2010, the company was launching “cpedia,” an attempt to create “automated articles about queries.” Arrington found the results—which, of course, now yield dead links—“sort of strange, but as an experiment it certainly have legs.” Having seen other attempts to auto-generate articles or useful pages, I’d start out skeptical and probably get more so. In any case, that’s not the heart of this item.

This is: After some negative comments on the new attempt, Cuil’s CEO wrote the kind of blog post a CEO should never write. It begins “Wow, the haters are out in force today” and adds this swipe at active web writers:

First up, Cpedia does very badly with people who write much more on the web than people write about them. Given the 1 billion people on the web one might think this unlikely, but it happens. When we try to summarize the information mentioning these people, we run into a problem. Almost none of it is about them. It’s about random things they have opined on. Dave Parrack, Farhad Manjoo, Louis Gray, I’m talking about you.

He continues, noting how Cpedia builds its so-called “articles”—assembling sentences from other sources, with links—and offers a truly unusual commentary on people’s assertion that the Cpedia results are lousy:

A third complaint was that our machines did not seem to really understand the material. People complained of rote recitation, rather than an in-depth understanding. It was ever so. As a child I was made to learn Irish. The Christian Brothers believed in a Platonic theory of learning, where all knowledge was recollection, so they would beat us with leather straps until we “remembered” our Irish vocabulary (this actually works). I, however, could never get full marks, no matter how well I remembered, because my Irish, while technically correct, had no “blas”.

Blas, for those of you not from the West of Ireland, is the polish a hurley gets from the sliothar when used by a player of unusual skill, a patina on the surface of the wood testifying to the depth of talent of the player that had used the stick. Fair enough. Cpedia does not have blas – it’s a machine.

Huh? Then comes the claim as to what Cpedia actually does:

Cpedia is not an attempt to build something that knows all current knowledge and can write a meaningful essay on any topic – that would be a stretch goal. Rather, we are trying solve a much simpler problem. When people search the web for information, a lot of times the first few results do not contain all the information there is about the subject. Almost no one can continue through all the other pages, because they are almost all regurgitations of the same material, with perhaps a few extra nuggets. Cpedia processes all the pages about a topic, and extracts the unique ideas.

That would be impressive—if a computer could actually do it. Could it? Could Cpedia?

Then things get strange at the very end:

The promise of Cpedia is that you will find information that you might otherwise miss. It often works for me. Your mileage will vary. If you find that the page about you is completely random, the only advice I can offer is a poem my six year old recited at breakfast:

A wise old owl sat in an oak,

The more he heard, the less he spoke,

The less he spoke, the more he heard,

Why aren’t we all like that wise old bird.

In short: If you try Cpedia and the results are crappy, shut up about it.

What happened with Cuil? According to Wikipedia, it reached a peak of 0.2% of web traffic in late July 2008—just after startup—and dropped to 0.02% by Septmeber 2008—and down to 0.005% in October 2008. Remarkably, it lasted until September 17, 2010, at which point it was shut down, with employees informed they wouldn’t be paid. (As always, the Discussion page for Wikipedia’s article may be more interesting than the article, with many of the comments coming on July 28, 2008.)

Who cares?

Why spend close to 3,000 words on a one-week phenomenon that’s long since disappeared? I think it’s instructive to look back at things like this now and then. You may disagree. In this particular case, I’d argue that Cpedia was nonsense from the beginning—and that Cuil’s display confused æsthetics with usability, making it an attractive nuisance. On the other hand, the image problem was just plain faulty design and operation: Insisting on an image with every search result is nearly sure to lead to misleading outcomes.

Basically, Cuil just didn’t work very well. The results display took too much space. The images actually got in the way—they didn’t help find the right results because they were wrong so much of the time. And the index itself was apparently old. Add to that operational problems (some sites found that Cuil’s crawler was causing problems, many people found that they couldn’t get to a second page of results), and it’s scarcely surprising that Cuil cooled off very rapidly.

Then there’s Knol

Remember Knol? I do. It was an interesting attempt to provide a signed alternative to Wikipedia—that is, articles by identified experts with clear writing voices, not the bland, “neutral” assemblages that Wikipedia articles tend toward.

It came from Google—and that might have been a weakness as much as a strength. Oddly enough, the timing’s similar: Knol opened for public use on July 23, 2008. By January 2009, it was up to 100,000 articles—but, since articles can be advertorials and there can be many articles (by different authors) on the same topic, that may not mean much. It’s Google, so it requires real names as Google defines them (an interesting issue), and it uses CC BY licenses (although individual authors can substitute BY-NC licenses). Interestingly, Knol uses “nofollow” on outgoing links—so that Knol links won’t affect search engine rankings.

I looked at Knol early on. I liked the idea in some ways—I believe the required anonymity and deliberate lack of writing style both damage Wikipedia’s usefulness—but I didn’t sign up, at least partly because Knol required verification with a credit card or phone number, partly because I felt no need to attempt “authoritative” articles and never lacked ways to get my own personal writing out there.

Knol is still around—but there have been no new announcements or release notes since December 2009. The address is knol.google.com. When I checked the site on September 2, 2011, “What’s new?” articles were edited as recently as 19 minutes and one hour previously—but they were all editing changes. I’d say Knol isn’t in the public eye, but clearly still serves many special audiences. Notably, it’s still explicitly marked beta, more than three years after it became publicly available—unlike Google+, which lost the beta mark almost immediately. I don’t see any indication of total number of articles; that may be just as well, given that an article can be almost anything. (Checking Librarianship as a search, one of the articles is—well, it’s a personal webpage. The only connection to librarianship that I can see is that the article includes a list of libraries holding a particular title—and, probably the reason for the result, citation of an article in Issues in Science and Technology Librarianship.

Exploring a little further

Knol is still there: That much is clear. Alexa doesn’t show traffic statistics for the site (which is a subdomain of Google); apparent alternative names are, as I’ve grown to expect, parking pages or dead.

A search for the phrase “library 2.0” yields only a page in some Arabic language. Without the quotes, 59 sites show up—the Arabic site first, a long and odd article “Knol Citation Goes Mainstream” second, and an odd mix of sites after that—including “Essenes: Did they believe in Jesus,” several iPhone-related items, still more self-references (“Knol First & Second Year Odyssey” by the same authors) and many more. (The “odyssey” says that page views passed one million in 2010, with “about 110” new articles. There’s clearly a missing qualifier here; those stats cannot be for all of Knol. Articles in English that are actually about Library 2.0? I didn’t find any.

To try to get a slightly better sense of the site’s current nature and activity, I tried a few things:

· Looked at “top authors” in English. The first one, Murry Shohat, has 314,000 views for 22 knols—including “How to Quickly Write a Basic Article Review” (93,000 views!), “Toward a Pragmatic and Dynamic Knol Library,” “Knol Writing Tips,” “Move that stuff: Pump Craigslist Ads with Big Pictures” and “Knol Help 911.” Oh, and “The Who’s Who of Knol,” “Knol Top Authors with High Page View and Badges,” “Knol Site Metrics Reveal Good, Bad & Ugly” and “Plagiarism on Knol.” Sense a theme here? The second one, Peter Baskerville, has about 140 Knols—and most of those in the first 20 have fewer than 100 views (and are very specific accounting topics). Ah, but here’s one with 13,000 views: “Knol—its possibilities.” Indeed… Third, Jagadeesh M, proclaims himself an SEO. Fourth—and the first I’ve encountered with more than one million pageviews—is Krishan Maggon, a pharmaceutical consultant with about 168 knols to his credit.

· Let’s look at recent articles in a couple of areas, where “recent” is from August 1 through August 31, 2011 (searching on September 2, 2011). “Librarianship” yields 17. First: “Publishing your Scientific, Technical or Medical manuscript”—which is really “about” open access publishing and largely a pitch for iMedPub, a “crowdsourcing medical publisher” that is not an OASPA member. Second: “Resume Guide.” Third: “Rosetta Stone.” All the rest: sections of George Peabody’s A-Z Handbook of the Massachusetts-Born Merchant… Knols that are even slightly relevant to librarianship: None, as far as I could see. How about Blu-ray, a fairly popular term? Sixteen articles—how to rip Blu-rays for the Mac, another two or three how-to items, an ad knol for a wedding video firm, and a whole bunch of knols by Anonymous, rich with odd wording and legal issues.

· Well, how about Open Access? Narrowing the search to exclude the phrase within contents (as opposed to title, summary and other elements), I get down from 237 to 52. It’s an odd mix, with a fair number of items from PLoS, iMedPub and other OA publishers, and nothing I’d consider to be a useful independent discussion.

· Did I mention odd wording? How’s this for an article title: “Epson 8350 – the quite finest Epson that I in fact recommended” with the following abstract:

I purchased the Epson 8350 to alternate a five-12 months-age-old Sony 720p projector in my family space. Like all projectors, your app and rewards will rely strongly on your own individual dwelling possibility and lighting illnesses. My space is not going to be a devoted theater area, and has some ambient lgt through the evening.

Maybe that’s a good place to stop. Clearly knol is being used by some medical folks and scientists. Equally clearly, it’s rife with articles that wouldn’t make the cut anywhere else, except—maybe—blogs. “Lighting illnesses”? Authoritative, perhaps, but not for me. (This particular writer has 15 knols to date—with a total of 375 pageviews. The one with the most pageviews, “lg bd590 best price,” is fascinating—and since it’s published under a CC BY license, I can quote as much as I like as long as I credit wester taslim. Here’s the summary.)

Introducing the particular major Blu-Ray Disc™ Individual in which may possibly merchant at the same time since movement! The particular precise BD590 gives someone the particular really very best with all the Net as well as wi-fi access in order to be able to NetCast, nonetheless that’s not necessarily each and every. Obtaining any 250GB difficult hard drive, almost all of the desired discretion may possibly have a home in 1 area, allowing one to right away recognize fresh audio, images, residence movies at the same time since LARGE CLASSIFICATION VOD by means of Vudu™. Whenever 1 gizmo may possibly offer this kind of distinct numerous residence discretion selections, an individual truly must find out —is this kind of the particular Blu-Ray Disc™ Individual, as well as several point significantly far better?

Honestly. I can’t make up stuff like that. Reading the whole article, I honestly couldn’t be sure exactly what was being reviewed, although it seemed to be a Blu-ray player with a hard disk.

Cites & Insights: Crawford at Large, Volume 11, Number 9, Whole # 144, ISSN 1534-0937, a journal of libraries, policy, technology and media, is written and produced by Walt Crawford.

All original material in this work is licensed under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/1.0 or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

URL: citesandinsights.info/civ11i9.pdf

Cites & Insights: Crawford at Large ISSN 1534-0937 Libraries · Policy · Technology · Media