Constructing The Electronic Beowulf
Andrew Prescott
In 1992, Andrew Prescott attended on the British Library's behalf
a conference at the University of Wisconsin in Madison to discuss
a project to produce a microfiche edition of all surviving Old
English manuscripts, the Anglo-Saxon Manuscripts in Microfiche
project. This conference initiated a very successful project,
which is in the process of producing not only cheap high-quality
microfiche reproductions of the entire manuscript corpus of Old
English but, just as importantly, is providing new and up-to-date
descriptions of all these manuscripts. For all its virtues, this
project will nevertheless perhaps in future years come to be regarded
as one of the last gasps of a microfilm technology which, since
the Second World War, has served scholars very well in providing
access to manuscripts and other rare source materials on a large
scale, but which was always cumbersome to use, prone to deterioration
and offered images which were a pale shadow of the original. The
high quality colour images which digital cameras can now produce
will, once scholars become accustomed to their use, make them
unwilling to settle for anything less (providing that libraries
do not make exorbitant charges for them).
Digital cameras offer colour images which show more clearly than
microfilm or black and white facsimiles such key details as abbreviation
and punctuation marks or erasures. The images will not degrade
with use. Details can be magnified, and different parts of the
manuscript (or different manuscripts) compared side by side on
the screen. However, these advantages come at a cost. In particular,
the large size of image files mean that users require very powerful
computers to access them, and storage and transfer of the files
can be a difficult task. Already in 1992, some scholars were suggesting
that the Anglo-Saxon Manuscripts in Microfiche series should
be based on digital images rather than microfilm. The difficulties
the editors of this series have encountered (and heroically overcome)
in obtaining permissions from many different manuscript owners
and arranging for their manuscripts to be filmed suggest that
this is an unfair criticism. If the series had been burdened with
the additional need to cope with the use of digital imaging techniques
which are still in their infancy, it would never have achieved
its primary aim of rapidly making available cheap and easily-accessible
reproductions of all Old English manuscripts. However, the microfiche
project did, indirectly, give a great fillip to the use of digital
technology in medieval studies in that the initial meeting in
Madison proved to be the starting point of the Electronic Beowulf
project.
At Madison, Andrew Prescott met for the first time two of the
world's most eminent Old English scholars, Professor Paul Szarmach
and Professor Kevin Kiernan. Professor Kiernan is Professor of
English at the University of Kentucky and is the leading expert
on the history of the Beowulf manuscript, Cotton MS. Vitellius
A.xv, one of the British Library's greatest treasures. He wrote
a controversial study of the manuscript in 1981 which for the
first time gave a detailed account of its history and proposed
that the composition of the poem was contemporary with the manuscript
(the prevailing view had been that the poem was considerably older
than the manuscript). He also published the first full analysis
of the important early transcripts of the manuscript by the Danish
antiquary Thorkelin. Professor Szarmach was at that time at the
State University of New York and the author of a number of important
studies of Anglo-Saxon manuscripts. As the editor of the Old
English Newsletter, he was renowned as one of the great academic
entrepreneurs of Old English studies, a role which he has been
able to extend considerably since his appointment in 1994 as the
Director of the International Medieval Institute at Western Michigan
University, Kalamazoo.
During the Madison conference, Professor Szarmach asked Andrew
Prescott (during a conversation in the men's lavatory, of all
places) how he felt the British Library would react to a proposal
to digitise the Beowulf manuscript. Knowing that the British
Library was starting to take an interest in investigating digital
imaging, Prescott replied that he thought that the time was just
right for such a proposal. Further discussions with Professor
Szarmach over a Thanksgiving Day lunch during a visit by him to
London in 1992 gave added impetus to the proposal. Our first action
was to get in further contact with Professor Kiernan. This involved
establishing an e-mail link from the British Library, at that
time still an unusual facility in the British Museum building.
At first, all this achieved was the circuitous relay of dramatic
descriptions of winter weather in New York by Professor Szarmach,
but as it became possible to discuss the project in greater detail,
it proved to have more exciting possibilities than at first realised.
The crux of Kiernan's 1981 book was that the Beowulf manuscript
was more complex in character than might have been expected if
it was assumed that the poem was transmitted by word of mouth
and not written down until sometime after its composition. He
drew attention to one folio which he argued might have been a
palimpsest folio - one in which the original text had been erased
and replaced with something else. This suggestion was not well
received by the scholarly community, but it is evident that something
very strange has happened to the manuscript at this point. Kiernan
suggested that digital image processing might reveal what was
going on, and as early as 1984 had experimented with making videotapes
of readings under particular lighting conditions to provide input
to a medical imaging machine. Three years later, he invaded the
Department of Manuscripts of the British Library with massive
medical imaging equipment to see if this would assist in interpreting
the manuscript. This experiment improved the legibility of some
sections of the page and raised some doubts about accepted readings,
but by no means resolved the problems raised by this folio. However,
Kiernan was conscious that, as the technology improved, it might
assist in investigating these numerous doubtful or uncertain points
in the manuscript.
Moreover, these were not the only points at which it seemed that
digital technology might assist in exploring the manuscript. The
manuscript had been badly damaged in the fire which in 1731 ravaged
the library formed by the sixteenth-century antiquary Sir Robert
Cotton. Eighteenth-century conservation techniques had proved
unequal to the task of stabilising the condition of the manuscript,
and it was left unprotected for over a hundred years. Following
its transfer to the British Museum in 1753, use of the brittle
and smoke-stained manuscript led to serious textual loss, with
pieces probably being left on the Reading Room floor every time
it was used. It was not until 1845, when the binder Henry Gough,
working under the supervision of the Keeper of Manuscripts, Sir
Frederic Madden, mounted each leaf of the manuscript in a protective
paper frame, that the condition of the manuscript was stabilised.
However, in order to have a retaining edge for the paper frame,
Gough was forced to obscure letters around the edge of the verso
of each leaf, obscuring hundreds of letters from view. This may
seem unfortunate, but at least these letters did not disappear
in a dustpan, as would otherwise have happened.
This conservation strategy was triumphantly vindicated by Kiernan
in 1983, when he showed that by using a cold fibre-optic light
source, which would not harm the manuscript, the concealed letters
could be deciphered. However, it was not possible to produce a
facsimile of the hidden letters. In order to read the letter,
it was necessary to hold the fibre-optic cable at an oblique angle,
and the letter could quickly disappear from sight as the angle
at which the cable was held changed. Kiernan guessed that, with
a conventional camera, it would be impossible to know, by the
time the shot had been taken and the film processed, if the elusive
reading had been correctly captured. Subsequent tests showed that
this was indeed the case and that these hidden letters could not
be recorded with a conventional camera. Given the contentious
nature of Beowulf studies, where discussions of single
readings can generate great academic controversies, and bearing
in mind that some of the hidden letters represented part of the
only known record of some old English words, the need to find
a method of recording readings made with fibre-optic lights was
pressing. Kiernan was anxious to see how far a digital camera
could help.
The project was, then, quickly progressing beyond the simple production
of straightforward electronic scans of the manuscript of the Beowulf
poem itself, which was what Szarmach and Prescott had
envisaged in their original discussion in the Madison lavatory,
and which Prescott had, at one point, optimistically suggested
would only take a few weeks to complete. In order to understand
the context of the Beowulf section of the manuscript, it
was clearly necessary to provide scans of the rest of the section
of the manuscript in which it is contained, known as the Nowell
Codex. It would also be worth exploring how far the concealed
letters in the rest of the Nowell Codex could be recorded. Cotton
MS Vitellius A.xv in fact consists of two separate and unrelated
manuscripts, bound up together probably by Sir Robert Cotton.
The other manuscript in the volume is known as the Southwick Codex,
and it would be helpful in conveying the full context of the Beowulf
text to provide an electronic facsimile of the Southwick Codex
as well.
Because of the fire damage to the Beowulf manuscript, transcripts
and collations of the poem made in the eighteenth and early nineteenth
centuries provide vital evidence of the text. The earliest sets
of transcripts, dating from the 1780s, belonged to the Danish
antiquary Grímur Jónsson Thorkelin, and are in the
Royal Library, Copenhagen. The first, known as Thorkelin `A',
was made for Thorkein by an English copyist who was not familiar
with Old English script but made a brave attempt to reproduce
the appearance of the letters in the manuscript. The second, Thorkelin
`B', is in Thorkelin's own hand. These transcripts record many
words and letters which afterwards crumbled away. Thorkelin published
the first edition of Beowulf in 1815. Two years later,
John Conybeare made a detailed comparison of Thorkelin's text
with the original manuscript, recording his findings in an interleaved
copy of Thorkelin's edition, noting which letters had vanished
in the time since Thorkelin looked at the manuscript. This, the
first collation of Thorkelin's edition, was in the possession
of Professor Whitney Bolton of Rutgers University. Professor Bolton
kindly donated this fascinating volume to the British Library
to facilitate its scanning, a generous gift for which the British
Library would like to record its thanks. A more accurate collaton
of Thorkelin's edition with the manuscript was made in 1824 by
Frederic Madden, afterwards Keeper of Manuscripts at the British
Museum. Madden's collation, prefaced by an eerily realistic drawing
of the first page of Beowulf, is now, with other annotated
books and transcripts by him, in the Houghton Library at Harvard
University.
The scope of the project, then, grew in the course of our initial
discussions, from an electronic facsimile of one section of a
manuscript into a collection of images of all the primary evidence
for the Beowulf text. This implied not so much the production
of an electronic edition as the creation of a digital archive
recording the history of Beowulf. As this view of the project
developed, it suggested a different approach to the electronic
edition to that espoused by other kindred projects. In such well-known
textual electronic editions as the Canterbury Tales Project
and the Piers Plowman Project, conventional editions are
being prepared in a SGML-tagged text. It is hoped in both cases
to use computers to compare the different witnesses of the text
in order to establish an authoritative original text, an ur-text
- a very conservative view of the aim of the editorial text. In
the case of the Beowulf project, the aim is not to arrive
at a definitive text, but to expose the layers of evidence for
the text, which are obscured in printed editions. The Electronic
Beowulf in essence seeks to dissolve the text into its constituent
parts. While the Canterbury Tales and Piers Plowman
projects will include images of manuscripts, these are ancillary
to the main purpose of both projects. By contrast, the Electronic
Beowulf seeks to confront the user with the different types
of evidence on which his understanding of the text depends, so
that it is the images which are central to the project, not their
interpretation into SGML-tagged text. One way of describing The
Electronic Beowulf might be as a diplomatic edition done with
pictures instead of words, but even this does not convey the radical
nature of the edition.
The Electronic Beowulf is therefore not simply an experiment
in applying new technology to a famous manuscript. It represents
a coherent and subversive challenge to the tradition of editing
texts. In this respect, it reflects the views of Kevin Kiernan
as to the nature of the Beowulf text and draws together
themes he has been developing in his work for nearly twenty years.
Kiernan has not only provided the intellectual vision behind the
project, but has also been the main driving force throughout and,
above all, has undertaken an immensely complex editorial task
in splicing the different images into a coherent whole. The project,
however, has involved a much larger team of people both in America
and England. Indeed, perhaps the most exciting part of the project
has been this Anglo-American collaboration.
The British Library provided the equipment for image capture and
the necessary technical, curatorial, photographic and conservation
staff to supervise the scanning of the manuscripts. This was complemented
by a similar investment by the University of Kentucky, which gave
Kiernan access to its Convex mass storage system to store the
images, provided him with a powerful Hewlett Packard Unix workstation
to work on the image files, and also gave essential technical
support. It was evident from the beginning of the project that
the British Library would never be able to provide the curatorial
resources to undertake detailed work on the images. Integral to
the concept of the project from the beginning was therefore the
assumption that external funding would be sought to provide Kiernan
with time to put together the final product. This funding was
provided by the Andrew W. Mellon Foundation which gave Kiernan
a grant to release him from teaching duties for a year. The National
Endowment for the Humanities also helped fund Professor Kiernan's
travel costs. One of the chief lessons of the Initiatives for
Access programme is that it will be difficult for libraries
to find the extra staff resources to undertake projects using
the new technologies. The Electronic Beowulf suggests that
one way of avoiding this problem is collaboration with external
partners. A collaborative approach is strewn with pitfalls. That
it was successful in this case has been due to the enthusiasm
and commitment to the project of both Kiernan and the University
of Kentucky.
As a result of these preliminary discussions, it was possible
to put a detailed proposal to the British Library's Digital and
Network Services Steering Committee, which directed the Initiatives
for Access programme, in the summer of 1993, and the committee
agreed to provide initial funding for the project. It is interesting
to note that the initial documentation for the project assumed
a completion date of 1998. The CD-ROM is now scheduled for release
in 1997, a year ahead of schedule. The Library funding allowed
the appointment of John Bennett of Strategic Information Management
to manage the purchase of equipment and establish the procedures
for image capture.
The most urgent question was selecting a suitable camera for use
in the project. In general, the best route appeared to be a digital
camera. Anything that involved direct contact of the scanning
device with the manuscript was unacceptable on conservation grounds,
which ruled out flatbed scanners. (It should be noted, however,
that where photocopying of documents is permitted, as in large
modern archives, there would be no objection to the use of flatbed
scanners, and this would be a perfectly acceptable way of scanning
large quantities of loose modern documents) Scanning from photographic
negatives might have been an acceptable way of proceeding if all
that was envisaged was a simple digital facsimile of the manuscript,
but such an approach would not have permitted the shots under
special lighting conditions, such as the fibre-optic shots, envisaged
in the project.
Moreover, the concensus among textual scholars who have worked
with digital images is that direct scanning gives a more legible
reproduction of the manuscript than any process involving a photographic
intermediary. Peter Robinson, for example, in comparing images
of the same manuscripts, points out that such details as faint
hairline abbreviations or damaged text appear more clearly on
images made with a digital camera than on equivalent PhotoCD images:
`The detail, accuracy, and range of the colour [in the digital
camera images] were such that in every case it was as if one were
reading the manuscript itself, but the manuscript enlarged and
printed on the computer screen.' By contrast, text that was readable
in these images became unreadable on PhotoCD: `The PhotoCD images
also seemed flat, lacking in colour range and contrast... Overall,
the effect of the PhotoCD images was that one was looking at a
very good reproduction of the manuscript, equivalent to a good
microfilm or a well printed facsimile'. Robinson ascribes the
limitations of PhotoCD to the resolution limits of the Kodak scanner,
so that it `cannot give an image better than that provided by
the best colour microfilm'. Against the better quality of the
images produced by direct scanning, however, must be set the greater
wear and tear on the original object produced by this process.
In early 1993, David Cooper and Peter Robinson of Oxford University
arranged for the demonstration in the Library of a high resolution
image capture system manufactured by a company called Primagraphics.
This offered very high resolutions (in theory, up to 5000 x 7000
pixels), but the demonstration immediately showed that the colour
quality was poor. Moreover, focussing of the camera was dependent
on very cumbersome histogram adjustments and access to the images
required expensive custom-made programmes. Dissatisfied with this
demonstration, Kiernan made an investigation of the digital cameras
available in the United States. A very successful demonstration
was arranged at Kentucky of the ProgRes 3012 camera manufactured
by a German medical imaging company, Kontron, marketed in America
by Roche. This camera produced a slightly smaller resolution than
the Primagraphics system (3096 x 2320 pixels; an upgrade offering
4490 x 3480 pixels is now available) but produced full 24-bit
colour. Moreover, the Kontron camera had a convenient real-time
focussing facility similar to adjusting a conventional camera,
with areas of over-exposure indicated on-screen. A feed was also
provided to a black and white monitor which was of great assistance
in setting up shots under special lighting conditions. The camera
could be used under all the three main operating systems, PC,
Mac and Unix. The camera had been used in the remarkable VASARI
project at the National Gallery, which had demonstrated the ability
of digital images to provide a superior record to conventional
photography of the condition of ancient artefacts. As a result
of Kiernan's recommendation of this camera, it was not only used
for the Electronic Beowulf project, but was discussed very
favourably in Peter Robinson's guide to The Digitization of
Primary Textual Sources (1992) and was purchased by the University
of Oxford and the National Library of Scotland. It has also been
the preferred camera of the ACCORD group in the United States,
which has established a listserv discussion group devoted to issues
associated with scanning manuscript materials using this camera.
The technical features of the camera have been lucidly described
by Peter Robinson in The Digitization of Primary Textual Sources,
so the discussion here will concentrate on the practical lessons
which have been learnt from its intensive use in the Beowulf
project. An immediate issue was lighting. Flash lighting is usually
preferred for manuscript photography, as it provides the best
colours with minimum exposure of the manuscript to light and heat.
Since the Kontron camera, when used on a Windows 3.1 platform,
requires a fifteen second scan in which the charge coupling device
(CCD) arrays move physically down and across the image, flash
lighting could not be used. The photographer assigned to the project,
Ann Gilbert, recommended that cold fluorescent lighting would
not produce good colours, so the decision was made to use two
2KW photoflood lights similar to those used for video filming.
The use of such intensive light is obviously a major issue for
the conservation of the manuscript. Exposure to light and heat
(measured according to the international unit of lux) will accelerate
the decay of a vellum manuscript. Not only will the light bleach
the ink, but the vellum will visibly move in the heat of the light.
When manuscripts are exhibited, the international standard is
that a manuscript should not be exposed to more than 50 lux. The
photofloods required for scanning with Kontron camera produced
hundreds of lux.
A detailed analysis was made by the Library's conservation staff
of the effect on the manuscript of exposure to light during the
digitisation process. It should not be assumed that, simply because
light levels are much higher than the 50 lux in the exhibition
galleries, this will automatically damage the manuscript. After
all, the ambient light in, say, the Library's Manuscripts reading
room is about 700 lux, so that every time the manuscript is used
by a reader, it is exposed to higher light levels. Moreover, since
the issue is one of ageing over time, it is not light levels from
a single exposure which are important - annual light exposure
is a more important consideration. Display in the exhibition galleries
subjects the manuscript to an annual light exposure of approximately
180,000 lux. It was calculated that, during the digitisation of
the manuscript, the light exposure would be equivalent to five
years display in the gallery. However, since a photographic facsimile
of the Nowell Codex has been available for some time, there has
not been much photographic work on the manuscript in recent years,
making the extra light exposure acceptable. Viewed from another
angle, the scanning of the manuscript involved an exposure to
light equivalent to about a year's continuous use in the higher
light levels of the reading room. If, as is hoped, the provision
of an electronic colour facsimile substantially reduces the frequency
with which the original manuscript needs to be consulted by readers,
then the digitisation process will have had no discernable effect
on the overall life of the manuscript.
It will be evident from this discussion that the various factors
that have to be taken into account in considering the effect of
light exposure on the manuscript are very complex, and need to
be considered on a case by case basis. Illuminated manuscripts
are, for example, generally too delicate for scanning of this
kind. Maps with wash colouring would also be prone to damage.
This is a serious limitation, as these are categories of material
for which digital images would be particularly useful. Because
of these considerations, close conservation supervision of the
process has been required throughout. We have been exceptionally
fortunate, in that the supervising conservator, David French,
has become deeply interested in the techniques of digital scanning,
and has effectively provided the main technical support for the
project. Nevertheless, the requirement for conservation control
has helped make the scanning extremely labour intensive, and conservation
issues have limited the range of material which can be scanned
with the camera.
Thus, the advice offered by Peter Robinson, `one should digitize
from the original object wherever possible', is oversimplistic
in that it takes no account of the conservation issues involved
in scanning the object. Ideally, one would want to use a different
light source with the Kontron camera. The best possible arrangement,
from a conservation point of view, would be to use a fluorescent
light source, but the experience of Oxford University in using
such a setup has not been encouraging. The Oxford project for
the scanning of early Welsh manuscripts found that, due to the
construction of the CCD, the use of cold lighting with the Kontron
camera created a `Newton's ring' effect, causing patterning on
the image. The Oxford University team, led by David Cooper, were
able to remove this pattern by post-processing of the image, but
clearly this is unsatisfactory as a long-term solution to the
problem. The Oxford project found that images comparable in quality
(and indeed with the possibility of higher resolution) could be
created with a digital back device, a scanner mounted on a conventional
camera. Recent work by the Manuscripts Photographic Studio at
the British Library has also shown that digital back devices work
very well under such a light source. Currently, Kontron continue
to recommend that cold lights should not be used wit the ProgRes
camera, and this problem is likely to limit the usefulness of
this camera for large-scale direct scanning.
Conservation issues of this kind are bound to continue to be a
major issue in developing projects involving direct scanning of
manuscript materials. Although fluorescent light sources are generally
described as `cold', the powerful lights required for scanning
still generate a great deal of ambient heat, and the length of
time required for the scanning process means that the light exposures
involved are still much higher than for conventonal photography.
Moreover, there is a risk that the development of the technology
will create greater pressure to rescan the manuscript, not less.
Every time a manuscript is rephotographed or rescanned, it is
subjected to more light exposure and general wear and tear, shortening
its overall life. Although a digital image (if properly stored
and maintained) will not degrade like a conventional photographic
negative, it is likely that in, say, four years time a 400 dpi
shot generated by a Kontron camera will look very crude by comparison
with images produced by the latest generation of digital cameras.
There will therefore be a demand to scan the manuscript again.
Trying to balance conservation issues against a wish to provide
the best possible surrogate images of a manuscript will be a major
issue for the custodians of the manuscript.
Despite these conservation issues, the images produced by the
Kontron camera throughout the project have been oustanding. The
camera has been exceptionally reliable and versatile - the only
piece of equipment at the London end of the project with which
there have been no problems. The performance of the camera was
particularly impressive for shots under special lighting conditions.
Although for conservation reasons other devices might now be preferred
to produce straightforward colour images of manuscripts, these
have yet to be tested with the arduous work undertaken by the
Kontron camera in the Beowulf project, and it seems likely
that they will be too cumbersome to undertake this kind of highly
specialised work.
It has been known for a long time that ultra-violet light enables
erased or damaged text to be read. Indeed, among the first experiments
in ultra-violet photography were attempts to read damaged portions
of the Beowulf manuscript in the 1930s. Conventional ultra-violet
photography characteristically requires very long exposure times,
sometimes as long as fifteen to twenty minutes. This is hazardous
for both the manuscript and the operator. It was found that ultra-violet
images could be made with the Kontron camera in the normal fifteen-second
scan time. The camera needed to be recalibrated under ultra-violet
light to take these shots, and fairly powerful ultra-violet light
sources were required. Since the Kontron camera only takes colour
images, the first results were little more than a murky blue patch,
but when the contrast of these was adjusted and they were transferred
to grey-scale, clear images of readings under ultra-violet light
were obtained. Similarly, the camera is very useful for infra-red
photography. Infra-red light is not very helpful with the Beowulf
manuscript, but can be essential for other types of
material. The Library possesses a collection of more than four
thousand Greek ostraca (potsherds with writing on them), many
of which can only be read under infra-red light. It was found
in tests that the Kontron camera produced very clear images of
these under infra-red light. The only complication in producing
these shots was that the infra-red filter in the camera has to
be removed.
Where the Kontron camera showed its greatest versatility was in
taking images of the sections of vellum concealed beneath the
paper frames protecting each folio, revealed by fibre optic lights
positioned behind the frame. Considerable experimentation was
required to arrive at the best procedure for taking these shots.
Initially it was hoped that an A4 fibre-optic pad, laid underneath
the folio, would reveal large areas of concealed text, but this
light source was not sufficiently powerful. Eventually, it was
found necessary to use two or more fibre optic cables clamped
into position behind each letter or group of letters. The camera
had to focus on a small area of very powerful light, but produced
very readable images. Subsequent image processing made the lost
letters even more legible. Under the direction of Professor Kiernan,
hundreds of such images have been shot, very demanding work as
the light had to be set at a very precise angle to capture the
hidden letters.
The acquisition by the British Library of a video-spectral comparator
(VSC) in the 1970s, which enables manuscripts to be examined under
a variety of precisely controlled light levels marked a break
away from traditional manuscript conservation work, which focusses
on the physical repair and preservation of manuscripts. For the
first time, conservators started to become involved in using technical
aids to recover further information about a manuscript which could
not be seen with the naked eye. The use of the Kontron camera
in the Beowulf project marks a further important step in
the development of this activity. The project has barely begun
to explore the possibilities of this technology. The Kontron camera
can, for example, be mounted onto microscopes and magnifiers,
which will enable very fine details of manuscripts to be recorded
and investigated. This application of the camera has not so far
been tested. At present, there is no means of mounting the Kontron
camera on the VSC. A frame-grab facility has been installed on
the Beowulf equipment, which allows digital images of readings
obtained by the VSC to be made, but these are low-resolution,
grey-scale images. A link between these two pieces of equipment
which would allow high resolution images to be taken under particular
light settings would be a desirable next step in the development
of the digital conservation studio which is gradually taking shape
in the Library.
The Kontron camera operates on a PC, Mac or Unix platform. It
was decided to run the camera for the Beowulf project on
a PC, running the latest version of Windows (3.1, and afterwards
3.11). This was a purely pragmatic decision. Although it was recognised
that Unix and Mac offered a superior environment to the PC for
the handling of images, Mac and Unix support in the Library was
very limited at the time that the project began, and it was felt
that it would be difficult enough to learn to use a new camera
without having to master a Mac or Unix machine as well. In fact,
despite the frustrations involved in the use of a different platform
in London from those used in America, this was a worthwhile experiment,
since the experience of regularly transferring images across from
PC to Mac or Unix gave a good insight into the perils of using
different operating platforms.
The uncompressed TIFF produced by the camera operating at full
resolution is just under 21 mb in size. This required the use
of much larger PCs than were commonly available in 1993. Indeed,
at that time the suppliers of the camera in Britain, Image Associates
of Thame, did not often use the full resolution and were operating
the camera on a 486 machine under 16 mb of RAM (upgraded to 32
mb for a demonstration in the Library). Initially, the PC purchased
from Image Associates to run the camera was a 486 DX2/66, with
an EISA system, with 64 mb of RAM (subsequently upgraded to 96
mb) and a 1 gigabyte hard disc. A 486 system was preferred as
Pentium machines were then only just becoming available, and were
by no means reliable. A Matrox 1024 video card was used, with
a 21" monitor. This PC was subsequently upgraded to a Pentium
P60 running a local VESA bus board, and the hard drives have been
progressively upgraded to the present 5 gigabytes on the image
capture machine (split 3, 2; the 3 gigabyte drive is further split
2, 1, because of the difficulty DOS has in addressing more than
2 gigabytes at a time). Two similarly configured PCs were acquired
for off-line processing, giving a further 10 gigabytes of local
storage.
The camera ran initially on WinCam, a fairly simple proprietary
image capture Windows programme supplied by Kontron, running on
a 16 pri-at card. The image handling software available for the
PC at the beginning of the project was limited. Halo Desktop Imager
was initially used, at the recommendation of Image Associates,
because it could load the full TIFF and was less memory-intensive
than other image handling packages available at that time. Adobe's
Photoshop was released for the PC shortly after the project began
and is now the standard tool within the project for handling images.
During 1996, Kontron produced a plug-in allowing images to be
captured from within Photoshop. This ostensibly gives much more
sophisticated control over the images than WinCam, but problems
have been found with the colour of images made using this plug-in.
It is to be hoped that the plug-in announced by Kontron for Photoshop
4.00 will prove more reliable.
Broadly, the PCs were satisfactory for simple capture and viewing
of images, but unsuitable for quality control and image manipulation.
The DOS/Windows architecture does not address all the RAM available
in the system, and this makes the performance slow and prone to
locking when processor-intensive operations like rotation or the
display of a number of different images are involved. The large
hard discs were also initially physically unreliable and prone
to failure. The most spectacular hard disc problem was probably
a complete failure during the first visit of a journalist to the
project in early 1994. The off-line machines have recently been
upgraded to a NT server workstation and client, and the improvement
of performance was immediately evident. By contrast, the HP Unix
workstation at the University of Kentucky, with 128 mb of RAM,
5 gigabytes of local storage and ethernet access to a large scale
Convex storage system, has proved extremely stable and reliable
in its handling of large image files, as have the various powerful
Macs used there.
More serious have been the problems in colour. Since Windows 3.11
is not a 32-bit operating system, it does not display the full
24 bits of the colour images. Although it was found the local
PCs could be successfully calibrated to match the colours of the
manuscript, these were displayed differently when transferred
to the Unix and Mac machines at Kentucky. By trial and error,
a calibration was found which gave the best results at Kentucky.
However, it was clearly unsatisfactory from the point of view
of quality control that the operators in London could not be exactly
sure how the images would appear when viewed in America. It is
hoped that the implementation of an NT4 platform for image capture
in London will eventually resolve this problem.
The contrast in display of the images between the PCs in London
and the Unix or Mac machines in Kentucky has been dramatic. The
images look good when seen on the PC in London, but viewed on
the Unix machine in Kentucky, they look majestic. The difference
between PC and Mac has also been evident in presentations of
the project, where the superiority of the Mac images in presentations
prepared at Kentucky over the PC images from presentations prepared
in London has been (embarrasingly for the London representative)
obvious. In general, my feeling is, that if there is one change
I would have made in the way the project was organised, it is
that we should have come to grips with using a Mac or Unix machine
to run the camera in London.
The other major technical issue has been the storage of the large
quantities of data produced by the project. The scan time of the
camera working on a Pentium machine under Windows 3.1 is about
15 seconds (This is considerably reduced when Windows NT or Windows
95 is used). This means that the speed of scanning can potentially
rival microfilm, but something has to be done with the data -
over 2 gigabytes can easily be produced in a normal working day.
The Library does not possess a large scale storage system for
data, and some kind of local storage capable of dealing with these
gigabytes of data (and, even, new words encountered early on in
the project, terabytes and petabytes). This was a problem encountered
by all the Initiatives for Access imaging projects, but
an additional problem faced by the Beowulf project was
that it was necessary to transfer all this data to Kentucky for
Professor Kiernan to work on it. Kentucky could store the data
on its Convex system, but somehow the data had to be got there.
Initially, it was felt that the best solution was to copy the
images onto 120 mb magneto-optical discs and gradually transfer
them to a machine which could be used to check the quality of
the images and, when sufficient data had been accrued, back them
up onto DAT tape. DAT offers large amounts of storage very cheaply
- in 1993, 2 gigabytes for less than twenty pounds. However, the
process of copying the images onto optical disc was tiresome
- every time six images had been shot, it was necesary to stop
work and wait for half an hour while the images were copied. Moreover,
the optical drive proved extremely unreliable. Experiences with
the DAT tapes were equally unsatisfactory. The backup procedure
was extremely time-consuming. It could take three times as long
to back up the images as to shoot them. DAT proved very unreliable
as a long-term storage medium. The cartridges needed checking
every six months, a very time-consuming process, and it was often
found that they had failed. DAT proved equally troublesome as
a means of transfer to America. The proprietary DOS software used
in the British Library could not be run on the Unix machines in
America. Although a tar utility eventually made the transfer easier,
it had eventually to be admitted that the cheapness of the DAT
was a false economy. The length of time spent in maintaining and
recovering data from the DAT wiped out the savings achieved through
the cheapness of the tape.
It was observed to me in a seminar in 1994 that these problems
were not so much ones of storage as of bandwidth, and indeed they
eased as the capacity of the networks improved. The University
of London Computing Centre was at that time investigating the
development of a national data archive based on a Convex Emass
tower. They suggested that this might provide a suitable home
for the British copy of the Beowulf data, and the existing
data was transferred on DAT tape to the ULCC archive. As the high
speed SuperJanet link became available, it was possible to transfer
image files across the network directly to ULCC. At first attempts
to transfer images across the network to Kentucky proved very
frustrating as the images took an extremely long time to be transmitted
across the small capacity lines then available, frequently triggering
time-outs in the file transfer protocol (FTP) software, which
cut off the transfer before it was completed. A programme called
`split' was developed in Kentucky to break down the files into
small components which could then be transferred, but such an
elaborate procedure was only suitable for small numbers of files.
As transatlantic network capacity has increased, however, it has
finally proved possible to transfer images by FTP from ULCC to
Kentucky, and this is currently now the normal method of transfer.
Network links are still, however, not as reliable or as speedy
as one would like, and, while in theory they provide the simplest
link, in practice other methods might yet prove to be more efficient.
One obvious procedure would be to link a CD-writer to the PC operating
the camera and transfer images by CD-Rom. The project has not
yet had access to a CD-writer in London to investigate this, but
experiences with the York Doomsday Play project in Lancaster
suggest that this may be an efficient approach. The images made
with the Kontron camera for this project have been stored at ULCC
then transferred via SuperJanet to Lancaster, where they have
been placed on CD. This is a time-consuming process, and by no
means yet entirely reliable, but nevertheless over 20 gb of data
have been stored on CD in this way. The directors of this project
can now sit at home in Lancaster and access quickly from CD colour
images of whole manuscripts in the Library's collection - a heady
experience when it is confronted for the first time, and perhaps
the most dramatic illustration of what the Initiatives for Access
programme was intended to achieve. The likely increase in CD capacity
in the near future (to perhaps as much as 6 gb) and the availability
of cheaper and easier to use external hard drive devices may make
direct physical transfer of images in this way of greater importance
in the short term than network transfer.
Knitting together the many different layers of the Electronic
Beowulf proved to be a further difficult problem. The idea was
to give the user hyperlinks between each different level of the
images. The user would be able to click on the appropriate part
of an image of a folio from the original manuscript and be able
to see hidden letters and ultra-violet readings in their appropriate
place. With a further click, he or she would be able to call up
images of the relevant sections of the Thorkelin transcripts or
Conybeare or Madden collations. We were anxious that the package
would be available for Unix, Mac and PC platforms. A Unix programme
which provided hyperlinks between all the different types of images
was developed in Kentucky. A Mac programme, called MacBeowulf,
was also developed under Professor Kiernan's supervision, which
he successfully demonstrated to the Medieval Academy of America
in 1995. However, development of a PC version proved very difficult
because of the variety of types of PC available, and Professor
Kiernan despaired of ever being able to produce a workable PC
front end. Moreover, it was not clear how this software would
be supported. Worst of all, it seemed likely that the editorial
work would have to be done three times over, once for each version
of the CD.
While the issues associated with this development work were being
investigated, the network browsers had made great strides forward.
Netscape version 2.0 for the first time offered the ability to
display frames containing different images side by side, precisely
how we wanted to display the Beowulf materials. With the development
of the Java programming language, it also became possible to develop
tools which would help the user of the images, such as a tool
for zooming in on different parts of the manuscript. Brooding
on the problems of providing a front end for Beowulf while
he was on holiday in Carolina, Professor Kiernan suddenly realised
that the network browsers in fact offered all the necessary functionality.
The only problem was that the networks did not have the capacity
to handle very smoothly the large image files, even when in a
compressed JPEG format. In fact, even though the image files will
be distributed as JPEGs, it will be difficult to fit them all
on two CDs. However, the network browsers not only read networked
files but also files held on a local disc. So the final CD- Rom
will use Netscape 3.00 as a front end to read HTML files and images
held on a local CD - the kind of hybrid approach which is likely
to be increasingly common until network capacities are substantially
increased. This offers a number of advantages. The materials on
the CD-ROM will essentially be independent of the software - the
prototype package works under both Internet Explorer and Netscape.
It will also be very simple to make the package available on the
internet when network speeds improve. The CD-ROM is currently
scheduled for publication in the summer of 1997.
Of course, this by no means exhausts all the problems we have
had to confront in creating the Electronic Beowulf. To
obtain images of the Thorkelin transcripts in Copenhagen and the
Madden collation in Harvard, we had to transport the camera, PC
and other equipment to Denmark and America, a considerable logistical
exercise, comparable to arranging a film shoot. When shooting
was ready to begin in Copenhagen, the large photoflood lights
blew up the electrical cirucuits of the Royal Library. The demands
of finally working all the digital materials into a publishable
form have been immense. These have been borne by Professor Kiernan,
who has had to sort, rotate and crop hundreds of images. He has
had to prepare thousands of complex HTML documents linking together
the images and associated scholarly apparatus. The lot of the
editor is made much harder by the new media.
It will be evident from this discussion that the digital path
is by no means a simple one. There sometimes seems to be a confident
assumption, almost Whiggish in its optimistic expectation of the
inexorability of technical progress, that the kinds of issues
described here will vanish as computers increase in power and
programmes become more sophisticated. Some will certainly disappear
as the technology improves and we become more proficient in the
use of it. Some difficulties have indeed eased in the course of
the project. At the beginning of the project, it was a major exercise
to move a 21 mb file outside a British Library building, which
ran the risk of wrecking e-mail systems and crashing catalogues.
Now we make such transfers routinely. Other problems, such as
the conservation issues involved in direct scanning or the inherent
additional cost of backing up data, may not vanish so easily.
The concept of the Initiatives for Access programme is
that the new technologies will make the Library's collections
easier for remote users to consult. In its experimental stage,
this is not necessarily a valid assumption - if all we had wanted
to do was to make colour images of the Beowulf manuscript
more easily available, it would have been simpler and cheaper
to have published a conventional colour facsimile. The extra cost
of using digital technology is only justifiable because it enables
us to see the great textual universe contained in a library such
as the British Library in new ways. Thus, the Electronic Beowulf
does not simply present us with colour images of the manuscript
but reveals features of the manuscript which cannot be recorded
in any other way and draws together widely dispersed information
about the history of the manuscript.
In short, the Electronic
Beowulf contains information of solid research value which
could only have been assembled by the use of digital technology.
For the humanities scholar, digital technology is interesting
not simply because it potentially offers easier access to source
materials but also because it provides a wholly new research tool
and produces results which are almost priceless in their value.
There is a terrible risk that as digital libraries enter a phase
of commercial development this will be lost sight of. The only
way of ensuring that the kind of research-orientated approach
which the Electronic Beowulf has pioneered continues and
develops is not to become fixated on the idol of access and the
prospect of new money-making services, but to maintain and build
on the kind of close cooperation between scholar, curator, conservator,
photographer and technical expert which have provided the basis
of the success of the Electronic Beowulf.
|