General

PDF is now ISO 32000

The Inside PDF blog is reporting that PDF version 1.7 has passed the ballot for approval to become ISO 32000 by a vote of 13 to 1. This is the first general version of PDF to become an ISO standard, joining PDF/Archive (PDF/A) and PDF/Exchange (PDF/X).

New blog: "Available Online"

Well, it's new to me, but it's been around since April: Alastair Dunning, Programme Manager for the JISC Digitisation Programme, has started a blog called "Available Online," which will undoubtedly be of interest to readers of Digitizationblog.

"Compound Information Objects: An OAI-ORE Perspective"

The Open Archive Initiative's Object Reuse and Exchange working group has released Compound Information Objects: An OAI-ORE Perspective, which describes their "interoperability layer that is a standardized means for publishing [...] repository-specific and application-specific implementations of compound objects to the web." This is the first major technical document from the OAI-ORE group.

Next-gen captcha aids digitization

As noted on Slashdot today, Network World has an article on the work of a Carnegie Mellon researcher who is developing an anti-spam technology that requires humans to enter hard-to-OCR text on web forms to demonstrate they are not spambots. reCAPTCHA provides plugins for WordPress, MediaWiki, and phpBB and also a web service. The Internet Archive is already benefiting from the output created by reCAPTCHA.

NPR "Talk of the Nation" episode on digitization

The May 11 "Talk of the Nation" featured Brewster Kahle (archive.org), Michael Hart (Project Gttenberg), and Michael Keller (Stanford University Librarian and Publisher of HighWire and Stanford University Press). The program is available on the NPR website.

Library of Congress to use Linux-based digitization systems

This article from Linux.com describes Scribe, the open-source technology developed by the Internet Archive (who is collaborating with LoC) to digitize and process books for searching. The newest versions will run on the popular Linux distribution Ubuntu instead of MS Windows. The article contains a description of the workflow used with Scribe.

Complexity of digitization mathematically expressed

Tom Blake, Digital Imaging Production Manager at Boston Public Library, offers this humorous but realistic formula for calculating the cost of digitizing a given quantity of print materials:


{[(linear feet) x (administrative imperative)2] x [1/(item level records)]/(distinct material formats)/metadata staff hours] x [public domain items/(orphaned works - copyrighted materials]/(risk assessment)2] x [(unbound flat items < 11x17)/4) + (bound pages/2) + (tightly bound pages) + (3 dimensional items x 4)]} x net value of donor/300 dpi

Posted with Tom's permission.

Some real data on Web 2.0 use

University of Oxford's Technology-Assisted lifelong Learning (TALL) group has published results of a survey investigiating use of Web 2.0 services among Oxford students (1400 students responded). The report shows what sites the respondents use and contribute to, but doesn't address how students use them.

Distributed Proofreaders

Many of you may have heard of Distributed Proofreaders (DP) already, which I just came across on the O'Reilly Radar blog. DP is a network of volunteers who perform what is the proofreading equivalent of the double-keying method: