Cites & Insights: Crawford at Large
ISSN 1534-0937
Libraries · Policy · Technology · Media

Selection from Cites & Insights 4, Number 9: July 2004

Trends & Quick Takes

What’s My Tune?

Walt Crawford

A news feature by Kinley Levack in the April 2004 EContent discusses recommendation engines for music from SavageBeast and Siren Systems. These engines supposedly have “highly advanced methods to determine what kind of music is similar to whatever your musical taste du jour may be that are far more intuitive and intelligent than a traditional text-based search.” The engines “analyze hundreds of attributes of songs in order to best categorize each selection.” SavageBeast looks at some 400 different traits in a song; Siren Systems looks at “700 data points” in songs. Both systems combine human and machine “intelligence” to categorize songs.

Do they work? The example shown for SavageBeast, Billy Joel’s “Piano Man,” offers overall recommendations and includes a bunch of “focus traits” so you can decide what “more like this” actually means to you. If I think of that song, I’ll go along with “storytelling lyrics,” “harmonica,” and maybe “folk influence”—although I don’t think of “Piano Man” in terms of mandolin or accordion, and for that matter “piano solo” strikes me as odd. The songs recommended in the example? Not bad for “storytelling lyrics.” Siren Systems’ “Soundflavor” has so few songs (5,000 or so, compared to a still-small 350,000 for SavageBeast) that it’s hard to draw any conclusions. I certainly agree that text matches don’t make sense for “more like this” music selection and that genre matches are awfully crude for individual songs. Where I take issue with the company spokespeople is when they overgeneralize, as product advocates typically do: Recommendation engines as “the logical next generation of search” and a claim that makes sense only in a shadow universe where only one model can win: “In the long run, a metadata model combined with collaborative capabilities is the one that will win out.” It’s never that simple—and if these systems scale and can be provided cost-effectively, it doesn’t need to be.

Flexible Electronic Displays

“Nothing beats paper when it comes to displaying readable text in a comfortable, familiar form factor. That’s one of the reasons that the ebook market has yet to take off.” Those are the lead sentences in an EContent news feature (April 2004, 10-12) by Geoff Daily that discusses progress in the “not-so-new” technology of epaper, which—as Daily notes—has been around for 25 years or so!

The military has put lots of money into R&D for flexible displays for military use. That may be a different set of criteria than the supposed “last book” or newspaper replacement. The story is weakened a bit by the color photo showing SmartPaper at work in a signboard: Whether because of resolution problems or something else, the lettering on the sign is pathetically ugly and so crude that you have to double-check to tell an “n” from an “m.” If that’s what Gyricon can do with big letters, they’re a long way from having acceptable text at normal text sizes.

Rescuing Old Recordings

An interesting news piece at (April 20) about a new audio preservation and restoration technique developed at Lawrence Berkeley National Laboratory. This technique uses a set of silicon detectors, originally designed to search for the Higgs boson, to scan the grooves of a record, with very high precision. Supposedly, algorithms used to eliminate noise in particle data recordings also work well to eliminate scratches and other flaws in the recording, after which the scan can be “played” on a “virtual record player.”

While the piece discusses vinyl, that’s loose journalism: It discusses “more than a million old vinyl records” in the British National Library Sound Archive and uses as an example a 1950 Leadbelly recording. When I read that, I thought, “Hmm. More likely to be a shellac 78 than a vinyl 33 or 45.” Searching for other news stories confirmed that suspicion: “Vinyl” is used in this article, sloppily, as shorthand for “physical analog recording.” Most recordings likely to be preserved and restored using these methods would be shellac, or wax cylinders, or other pre-vinyl forms.

The idea of playing a record without contacting its grooves isn’t new; there is an expensive turntable that uses lasers to read the grooves. But the scanner should yield much more information than laser reading, and the software techniques for differentiating scratches from actual recorded information are probably more sophisticated.

The story included links to two sound files from that 1950 Leadbelly shellac recording of “Goodnight Irene”—one representing the original, the other after the scan-and-restore process was applied. The difference is astonishing, reducing heavy surface noise and scratches to a low level of surface noise with no apparent damage to the recording itself. Good stuff!

So Many Books!

According to Bowker, a staggering 175,000 new book titles and editions were published in the U.S. last year—19% more than the incredibly high figure for 2002. That includes 17,000 general adult fiction titles, 16,000 juvenile titles, and 12,000 titles from university presses. Overall, new titles have increased by 50% since 1994—and those titles come from a record 78,000 publishers, nearly 11,000 more than in 2002. More than 16,000 publishers are in California, more than twice as many as in New York—although New York City still has more publishers than any other city.

Maximum Burn

That’s the title of an Alex Kosiorek article in Radio Magazine (April 1, 2004). Kosiorek, the audio recording and mastering engineer at the Corbett Studio at WGUC-FM, Cincinnatti, discusses CD-Rs—particularly for audio use, and particularly when you might want to keep them for a while. CD-R “remains the most common optical media format used in audio/radio production environments”—but with higher-speed media and drives and ever-cheaper media, it can be problematic.

He offers some advice that may be useful if you’re planning to make audio CD-Rs, and particularly if you plan to use them for several years:

Ø    Don’t use high-speed media in standalone audio recorders (you generally can’t anyway, since most consumer standalone recorders will only accept audio-certified CD-Rs). Stay away from CD-Rs labeled for 48x or higher speed; 24x and 32x may be OK. The high-speed formulations may not work properly at the 1x and 2x speeds of audio recorders.

Ø    There’s no specific speed that will assure the fewest errors and best quality for any given medium. If you want to play it relatively safe, stay away from generic and store-brand CD-Rs, always use disc-at-once mode (standard for audio recording) rather than track-at-once mode, and try burning at roughly one-third of the drive’s maximum burn speed (which will never mean more than 16x in the real world). That will take a few minutes longer for each disc, but should keep you out of trouble.

Ø    In his tests, Verbatim Data Life did the best—when burned at 8x or 16x. With Verbatim Plus, you know for sure where the discs are actually made (Mitsubishi Chemical); with most name brands, you can’t be sure. (Verbatim also uses a different dye formulation than most other CD-Rs—it’s obvious when you turn them over, as they’re teal or bluish-green rather than silvery.)

Ø    Keep CD-Rs away from sunlight, heat and moisture.

Ø    Label discs carefully, with CD-certified pens (using adhesive labels only for discs you don’t plan to keep forever).

Quicker Takes

Joi Ito may be one of the new gods of the internet, but based on a little Wired item he might want to learn to ask about prices. Going on a business trip, he got a new cell phone that allowed him to connect his notebook to the Internet via GPRS. Internet everywhere. “It was sooo cool…” The access rates were on the company’s website, but I guess it’s like actually listening to speakers at a conference: The A-list can’t be bothered. His monthly bill was for $3,516.46—with $2,825.28 of that being data roaming charges. Reading 28MB of blogs in a car bound for the Zurich airport: $422.32. (28MB of blogs? I guess with moblogging and photos, and when you’re in Ito’s A-list position, that makes a curious sort of sense.)

Harry McCracken offers an interesting perspective in his “Up Front” column in the June 2004 PC World: “The more operating systems, the merrier.” No, he’s not a Linux convert. Instead, he ordered an Apple PowerBook to complement his two Windows desktops. “At the moment using both OSs seems utterly natural.” He finds himself a “Mac snob on a part-time basis” and notes, “Odds are that the next computer I buy will be another Windows box, but I’m glad I realized that the Mac remains a viable option—even for a mostly Windows guy like me.” Read the one-page column for more detail. (Would I consider a Mac under the right circumstances? I would and have, but the circumstances have never been right for me.)

If you’ve disdained email for its virus proclivities, don’t believe instant messaging is safe. It isn’t. Viruses and worms increasingly spread via IM, partly because it’s a “softer target” in many cases. Good antivirus programs will catch most IM attacks, but you need to be as thoughtful about the links and attachments in IM as you would be in email.

Which Windows is most secure? According to Russ Cooper of TruSecure (in a recent presentation in Australia), newer isn’t necessarily better—depending on how you measure. He tracked the number of patched vulnerablities in each Windows version, analyzing a total of 452 different vulnerabilities in 298 Microsoft Security bulletins. His conclusion was that older is better—if all you care about is the number of vulnerabilities. That’s not all you should care about, of course, as he makes clear—and does anyone really want to run NT4.0 on a brand-new PC?

Cites & Insights: Crawford at Large, Volume 4, Number 9, Whole Issue 52, ISSN 1534-0937, is written and produced by Walt Crawford, a senior analyst at RLG. Opinions herein do not reflect those of RLG. Comments should be sent to Cites & Insights: Crawford at Large is copyright © 2004 by Walt Crawford: Some rights reserved.

All original material in this work is licensed under the Creative Commons Attribution-Non­Commercial License. To view a copy of this license, visit or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.