Cites & Insights: Crawford at Large
ISSN 1534-0937
Libraries · Policy · Technology · Media

Selection from Cites & Insights 5, Number 5: Spring 2005

Bibs & Blather

A Little Spring Cleaning

This Spring issue—timed appropriately—appears for three reasons:

Ø Some “spring cleaning,” adding new thematic sections and recognizing that old ones are defunct.

Ø Another of my periodic admonitions to get out of town, timed about when you should be making summer vacation plans (if you haven’t already). (See Perspective: Go Away—Not Now, But Soon!)

Ø Ancillary decisions, one ready to announce and discuss here. (See “HTML: An Internal Conversation” below.)

Thematic Changes

Cites & Insights thematic sections come and go. Ebooks, Etext and PoD is gone as a separate section. Censorware Chronicles is dormant if not defunct. I dropped Cheap Shots & Commentary almost two years ago. Looking at the mounds of material that I want to discuss and the essays I’ve been writing, the time seems right to add some new sections as well.

Copyright Currents

I’m dividing Copyright Currents into four sections, based on the four-part view I suggest in Library Technology Reports (forthcoming):

Ø ©1: Length and breadth (“copyright universal and everlasting”)

Ø ©2: The commons: Public domain, derivative works, initiatives such as Creative Commons

Ø ©3: Balancing rights: fair use, first sale, digital restrictions management, piracy…

Ø ©4: Locking down technology

My hope is to offer shorter sections more often, sometimes more than one section in an issue. If there’s a set of issues that won’t fit in those subcategories, Copyright Currents remains available.

Net Media

I can’t seem to get away from blogs, RSS, wikis, and the other tools and religions of internet culture. Think of this new section as an offshoot of Trends & Quick Takes on one hand and The Good Stuff on the other. My first name for this section was “The Infosphere.” But I’ve made fun of others for always wanting to use a neologism when there’s already a perfectly good term. Since blogs, wikis, and these other things are basically just media that depend on the internet, I’ll call them that: net media. In general, Net Media sections will relate internet-based media to libraries—but don’t count on it.

HTML: An Internal Conversation

Ever since I started Cites & Insights, there have been those who expressed desire for an HTML version. In most cases, it was a polite suggestion. In a few, it was a demand, once accompanied by profanity over my refusal to produce the publication the way this (presumably former) reader desired.

I believe that I had (and have) good reasons for doing C&I in PDF form—and that those reasons are ecologically sound. Here’s what I’ve said in the C&I FAQ, which I suspect most of you haven’t read:

Why are issues PDF rather than HTML?

Ø Issues are too long to read comfortably at the computer…typically 14 to 20 pages, two columns each, with each column wide enough for a screen.

Ø The two-column print format yields a reasonably compact print version. A screen-optimized HTML version would be much longer. (A reasonably-formatted HTML version of a 20-page issue would use at least 30 print pages.)

Ø I care about typography and the PDF package retains the typography of the original.

Don't you dislike PDF as a single-owner proprietary format?

Yes. But I really care about typography.

Acrobat Distller lets me use the typefaces I like and know that you'll see the same typefaces on your copy—and I didn't have to switch from TrueType to PostScript.

It's a compromise between my open-format principles and my desire to distribute this newsletter looking the way I want it to look. Life is full of compromises.

Four years later…

When I prepared those notes (which have been refined over the years), C&I used Arrus BT and Friz Quadrata BT, two superb Bitstream typefaces available to anyone using one of several Corel products such as Ventura Publisher or Corel Draw. I knew most people would not have those typefaces installed. I did not know of any generally installed text face that I considered nearly as readable as Arrus; I still don’t.

I suggested that a 20-page issue (with side margins) would use at least 30 print pages. I was conservative: An HTML version of a 20-page issue without such margins runs 38 to 42 pages.

This year, I upgraded the typography: Body type is now Berkeley Book (Berkeley for boldface). It’s not quite as good as Arrus for on-screen reading, but it’s even more readable and handsome in print. It’s also smaller, so I’ve increased the point sizes in C&I to compensate.

Meanwhile, I thought about the essays in C&I, their potential reach, and whether strict adherence to PDF was an obstacle to that potential.

The soft test

A couple of weeks after producing the February issue (5:3), I generated HTML versions of each story (with a standard header and footer). I uploaded those versions and provided links from the “all contents” version of the 5:3 table of contents, but not the contents on the home page. I discussed the postings and level of response in C&I 5:4 (Perspective: The Dangling Conversation). Briefly, between those versions and a second set of HTML files generated with C&I 5:4, I received comments from at least 36 people. Here’s how I summarize those comments:

Ø Eight people preferred PDF, didn’t see much use for HTML (particularly if the internal links aren’t live), and basically said “it’s not broke, don’t fix it.”

Ø Fourteen people offered split comments—they read and like the PDF, but they can see the virtues of HTML as well, particularly for individual-article inbound links. A couple of them couldn’t see much point to HTML if the internal links weren’t live.

Ø Twelve people favored HTML. One of them said that I “need to do” HTML. Nobody was abusive. Several seemed to assume that HTML versions would automatically have live links and that I’d provide a nice overall navigation structure, essentially doing a full HTML version of C&I.

Ø The other two were discussing tools and methods for me to do good HTML—or, in one case, an interesting suggestion for solving a different problem than I’m trying to address.

The original set of HTML files for 5:3 (the files with .HTML extensions) had truly atrocious HTML markup—markup so bad that the text face varied back and forth between my face of choice (Book Antiqua/Palatino) and the user’s default text, sometimes within the same paragraph. That can be hard to spot. I finally set my default text to Engraver, a “currency” typeface that can’t be mistaken for any normal text face. I was horrified by the results. The second set of 5:3 files (with .HTM extensions) and the selected files for 5:4 (also with .HTM extensions) used a lower-overhead method that produces much cleaner HTML.

Methodology

Let’s talk about those generation methods a little—understanding that any HTML equivalents must be quick, easy, no-learning-curve extensions of the Word-to-PDF production process. I do Cites & Insights on my own time, as with all other writing. That time typically amounts to an hour a day, if there aren’t other demands, plus a few hours on some weekends. Each hour spent messing around with the publication process takes a day away from reading, writing, and relaxing. I’m protective of those slots, particularly since I like to “waste” some of them on non-computer activities. Up to now, it’s taken two to four hours to turn a set of articles into an issue (copyfitting and final editing), half an hour to an hour to modify the C&I pages and upload the issue, and another half-hour to an hour to update the running volume index and update the raw material files to eliminate what’s been published. I might be willing to add another half hour to the publication process to produce HTML files if they seem useful rather than distractions. I would not be willing to add another two hours—or to adopt a process with even 5 hours’ learning curve.

The bundled tool I use to maintain my simple web pages, Symantec Visual Page, is truly minimal: The FTP client works just fine, as does the HTML editor, but it doesn’t import anything but the text from word processing files. It was inadequate for this job.

Being cheap, I tried something else: Web Page Creator from Cosmi Corporation, part of the $5-$10 Swift Jewel series carried by Office Depot. Web Page Creator does read Word documents and generate HTML versions. The reason it reads Word documents fairly well is apparent from the actual install process. Namely, “Web Page Creator” is the OpenOffice HTML editor—what you get on the CD is OpenOffice 1.0.2 in its entirety. One “selling” point for OpenOffice is Microsoft Office file compatibility. Indeed, OpenWriter and the HTML editor both read Word files nicely, including template-based styles. So what I got for $5 was OpenOffice.

Turns out, as those early .HTML versions show, the HTML editor does truly crappy HTML when fed Word template-based documents. It insists on paragraph-by-paragraph typeface and point size assignments (there is a CSS section, but it’s commented out). It loses track of the typeface, so you lose typeface integrity. Given its druthers, it assigns text to some oddball typeface, Thorndale, which I’ve never heard of. The typeface isn’t installed by the OpenOffice install process. That’s a first for me: A program that defaults to a nonexistent typeface.

I still have OpenOffice on my PC, but I’m not sure why. The HTML editor may be fine when working from scratch—but then, so is Symantec Visual Page. I needed something a lot more automatic.

I know how much people have reviled Word’s Web output—and I know that the first time I tried it, it was ghastly: Enormous, complicated, uneditable.

There’s another option in Word XP (Word 2002) and, presumably, newer versions: “Web page, filtered.” I took C&I 5:4, switched to the “web template” (which switches in Book Antiqua instead of the two print typefaces and eliminates a few niceties), replaced the first-page banner with a “Selection from…” HTML header, and generated selected sections by the simple process of loading the whole document, eliminating all but the one story, and Saving as… repeatedly. Total time: less than 15 minutes for the whole set of stories—much faster than the OpenOffice process.

That’s what you’re seeing in the current .HTM versions: Word “filtered web” output with no modifications after the fact. It’s not great HTML, but it’s not terrible. It uses CSS, albeit embedded in each file rather than as a separate file. The typography is intact and consistent. Macs, most of which don’t have Book Antiqua, seem to degrade nicely to the default serif typeface; I can live with that. By modifying the “Properties” tab before saving each article, I get the title I want and some keywords as well—again, not great metadata, but good enough.

Talking to myself

Should I do this or not? Here’s how the internal discussion went… “Geez, Walt,” I say to myself, “that’s really not what I had in mind. It’s clear that printing out HTML will use twice as much paper as the PDF form—maybe more. It’s clear that a printed version that ‘only’ takes twice as much paper will be a lot less readable than the PDF, since the print columns are too wide for optimal readability. This is a bad idea.”

Yeah, but the HTML versions will allow inbound links to specific articles, encouraging readership outside the community that has any interest in the whole thing. That’s good, if you care about what you write—it spreads the messages more broadly. And the HTML version doesn’t look that bad, even if it isn’t as pretty as the PDF.

“You’ve worked hard to make the PDF form attractive, readable, well-organized. What if people abandon the PDF for less-readable, less well organized, paper-wasting HTML?”

I don’t think that will happen, based on the comments: 22 of 34 will continue with PDF, and I assume that’s true of most who didn’t respond. Besides, if people really want to read on screen, the HTML is much better.

“That’s another problem. Most C&I essays are too long to be read on screen, and I believe there’s good evidence that some people who do read them on screen don’t fully comprehend what they’re reading.”

Who died and made you the arbiter of reading styles? If people want to read on screen, you can’t stop them—and why should you? Do the damn HTML; you’ve come up with an easy method that’s not ugly.

“Well, yes, but there’s another problem. My sense is that online text tends toward short and snappy: Brief thoughts expressed briefly. If I see I’m getting lots of HTML readership, my natural tendency would be to start making paragraphs shorter, sentences simpler, thoughts cruder. I’ll be inclined to dumb it down and substitute black-and-white thinking for the gray that now dominates C&I. I don’t want that to happen.”

Bull. There’s no reason to believe you’ll lose PDF readership. Your prose style ain’t all that hot anyway, but nobody’s going to force you to dumb it down just because some people read it on the screen. You’ve seen enough blogs and websites with multi-thousand-word essays. As long as HTML is an offshoot, this particular fear is just dumb.

“Maybe that’s true. And it’s probably true that I really shouldn’t care as much how people read this stuff, as long as they do read it. Just because I don’t want to read more than 500 words on the screen (but frequently do), just because even studies of the ‘digital generation’ seem to show a similar revulsion among most of them for extended on-screen reading and understanding…well, so what?”

Now that I’m through talking to myself…

I’m adding selective HTML to Cites & Insights. The four articles from C&I 5:4 will stay there indefinitely. I’ve added cleaner .HTM articles for 5:3 and new selective .HTM for 5:1 and 5:2. By the time this issue appears, I’ll have added HTML for the last two issues of C&I 4, and I intend to go back through that entire volume. I may or may not do volumes 1 through 3. If I do, I’ll announce them at the C&I Updates blog, on my LISNews journal, and elsewhere as appropriate. You’ll see links in the table of contents for some or all of this issue’s articles and for issues in the future. Those files will stay mounted indefinitely. (“Permanent” doesn’t fit web content very well…)

When I say “selective HTML,” what do I mean?

Ø Any article that takes up more than about 40% of an issue will not have an HTML version: That’s a pure waste of paper.

Ø Most Bibs & Blather, Following Up, Feedback, and other “internal” sections won’t have HTML versions. There may be exceptions.

Ø If I believe an article makes sense only or primarily within the context of a complete issue, I won’t do an HTML version.

Ø If an article appears in HTML, the whole article will appear. Selectivity will be at the level indicated by each issue’s table of contents.

Don’t expect live links within the HTML. Don’t expect snazzy title-based URLs for the HTML files. The URL pattern will be “vNiMX.htm,” where “N” is the volume number, “M” is the issue number, and “X” is a lower-case letter tagging the story, starting with “a.” Thus, the first HTML piece from this issue will be at citesandinsights.info/v5i5a.htm

Feel free to link directly to articles. All articles link to the issue. All use the same Creative Commons “BY-NC” license as the issues. They are, to be sure, easier to quote from and forward to others.

I should add that YBP made it very clear that there would be no pressure of any sort from them for me to add HTML or make any other format changes.

Cites & Insights: Crawford at Large, Volume 5, Number 5, Whole Issue 61, ISSN 1534-0937, a journal of libraries, policy, technology and media, is written and produced by Walt Crawford, a senior analyst at RLG.

Cites & Insights is sponsored by YBP Library Services, http://www.ybp.com.

Hosting provided by Boise State University Libraries.

Opinions herein may not represent those of RLG, YBP Library Services, or Boise State University Libraries.

All original material in this work is licensed under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/1.0 or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

URL: citesandinsights.info/civ5i5.pdf

Cites & Insights: Crawford at Large ISSN 1534-0937 Libraries · Policy · Technology · Media