I just returned from Access 2006 in Ottawa. Another great year for this conference. I didn't blog the event, since I knew it was already going to be hyperblogged -- check out, for example, Loomware, onebiglibrary, Quædam cuiusdam, and Library Webchick. See also the conference site's planet, the Technorati tag, and the flickr group (some photos of speakers were taken by Dan Chudnov and his new MacBook Pro's built-in camera). I'll offer a short report on topics directly related to digitization and digital collection building, although many themes from the conference are relevant, such as Web 2.0, mashups, and removing barriers for users.
The Hackfest is a major part of Access. People suggest projects that can be chewed on in a single day by small workgroups. On the day of the Hackfest, participants choose a project, collaborate on it, and some time during the conference, report on it. All the projects are good, but for digitization check out AJAX METS creator, The Witness Relocation Program for Metadata, and Elements for defining relevance – what the heck could they be?.
Art Rhyno and Walter Lewis demonstrated the Our Ontario prototype, which brings together digital content from a variety of institutions into one search portal. The search application uses Lucene, which is emerging as the indexer of choice for a number of open source and commercial products since it provides fast searches, can search across separate indexes, and facilitates faceted browsing (something which Peter Binkley demonstrated on the next day during one of the thunder talks). Lucene deserves attention: As Art put it, "Your search engine has a major impact on 'what happens on the other side of the line'" -- that is, the retrieval features you can offer your users.
Ron Davies gave a useful session summarizing digital library activity in the European Union. Interoperablity and multilingual access are priorities for EC's digital library, and since Dec. 2004, the EU has been very aggressive in supporting digitization and access to cultural material in order to support the Lisbon Agenda, specifically through the i2010 program: make at least 6 million primary documents available by 2010. Ron also described independent activity in England and France, and described the European Library (TEL), which is a portal bringing together much of this content.
Several of the Thunder Talks, a series of 10-minute presentations each demonstrating a project or application, are relevant to digitization. I presented my "Drupal Hacks for Libraries," which included a rushed demo of my DLCMS, or Digital Library Content Management System (stay tuned to digitizationblog for more on that project in a little while). As mentioned above, Peter spoke about "Faceted search with Solr", which is a web service implementation of Lucene.
Stan Ruecker, in a talk titled "Interfaces for the Dynamic Visual Grouping of Data During Browsing", provided a number of examples of information visualization for retrieval, including a retrieval system based small visual representations of pills, an application for querying pictures of human faces that allows queries by attributes like hair color, whether the person wears glasses, etc., and Mandala, which is a generalized visual retieval browser. Neat stuff.
Cliff Lynch is a must see at Access, and I want to report here his observation that large-scale digitization projects are finally taking bit-stream preservation seriously during their planning stages, unlike in the past where that necessary function was simply assumed. Managing the files produced by these projects proactively is of course necessary to preserving them.
There was a lot more of interest at Access, and as usual, talking to old friends and getting to meet new ones was an essential component of this conference. No matter how enjoyable the event is, I never stop looking forward to next year's.
Comments
[...] The Apache Lucene
[...] The Apache Lucene search engine is probably the most widely adopted open-source search engine. It is, in fact, gaining huge popularity in the digital collections world (as evidenced, for example, by the Lucene workshop at the recent Code4Lib conference). As Mark Jordan noted in his report on the Access 2006 library conference, “Lucene is emerging as the indexer of choice for a number of open source and commercial products since it provides fast searches, can search across separate indexes, and facilitates faceted browsing.” [...]
Hi Mark. Sounds like a great
Hi Mark. Sounds like a great conference. Wish I could have been there.
RE: Lucene and Drupal.
Zend has created a PHP5 port of the Lucene Search engine as part of their Zend Framework. See http://framework.zend.com/manual/en/zend.search.html
It is intended to be a one-to-one port of the original (Java-based) Lucene engine although it doesn't yet have all the features of the original.
It is very easy to implement. I have used it as the default search engine in the Archival Description software system that I developed recently using the Symfony PHP5 platform (see http://archivemati.ca/2006/10/02/ica-atom-alpha-v01-release/).
I imagine it shouldn't be that difficult to integrate it into your Drupal CMS as well which, I assume, is also running on top of PHP version 5.