The Digital Library in Theory and Practice: a historian's view

Andrew Prescott


One of the recurrent images in Peter Ackroyd's recent novel, Dan Leno and the Limehouse Golem, is of the British Museum Reading Room as "the true spiritual centre of London, where many secrets might finally be revealed." Much of the novel is set in the Reading Room in the early 1880s. Karl Marx, George Gissing and the comedian Dan Leno work side by side "unaware of each other as if they were sealed in separate chambers." Unknown to them, their different preoccupations intersect and spill out onto the streets of London to involve them in a series of horrific murders, which are described in a diary which itself ends up as an Additional Manuscript in the British Museum.

In the novel, Gissing is working on an article on Charles Babbage's calculating machines. Ackroyd imagines that Babbage's Analytical Engine - a prototype computer - was actually built instead of remaining a series of drawings. It is stored in a building in Limehouse, where it is carefully tended until the time "when the public mind is fully prepared for its use." Gissing goes to inspect the machine, which he sees as a "metal demon summoned by the sullen appetites of men," the spiritual counterweight to the Museum Reading Room. In Gissing's mind, there is a connection between Babbage's vision of a world in which all phenomena are tabulated and enumerated, and the hopelessness of life in East London.

Ackroyd's juxtaposition of the Museum Reading Room in the west of London and Babbage's analytical engine as the genius loci of East London is a potent image, and one that seems very pertinent to discussion of the digital library. It is an ironic piece of symbolism in that the Difference Engine no. 1, the calculating engine that preceded Babbage's vision of a mechanical computer, was offered to the British Museum for display when work on its construction was abandoned in 1842. It was thought inappropriate for the Museum and - in what might be seen as an anticipation of more recent networking politics - the engine and Babbage's designs were instead assigned to the Museum of King's College London. Kings College afterwards got rid of the engine and managed to lose many of Babbage's priceless drawings, but in Derek Law's presence I won't dwell on that.

The old Museum Reading Room in the west, Babbage in the east. Likewise, current discussion of the future of libraries often pits the old-fashioned centralised humanities-dominated research library against the dispersed, service-orientated and technology-driven digital library. Needless to say, images of this sort are superficial, partly because they lack historical depth. The Museum Reading Room which Ackroyd describes was - with its prefabricated stacks, mobile presses, electric lighting, automated delivery of requests, and mechanical copying of catalogue entries - as much at the forefront of technology as any of Babbage's inventions. Visitors from all over the world came to marvel at it, not for the beauty of its architecture but for the way in which it pioneered the use of technology in support of scholarship. Panizzi's "iron library" was the digital library of its day. It is no surprise to find that Panizzi and Babbage were in correspondence, and one library historian has not been able to resist the speculation that they discussed ways in which Babbage's engines could be of use in a library.

In the same way as a Museum Reading Room/Babbage juxtaposition strikes me as unsatisfactory because it lacks historical depth, so much of the current discussion of the digital library strikes me as shallow because there is no historical perspective. It may seem surprising to suggest that, in discussing a new technological development, the historical background should be considered, but, as Marc Bloch has pointed out, it is a fallacy to imagine "the course of human evolution as a series of short, violent jerks, no one of which exceeds the space of a few lifetimes." It seems that many commentators on the present technological changes implicitly espouse such a view. It has been extraordinary in recent years to see in literature on computing the reemergence of that nineteenth century confidence in the inevitability of technological progress and its irresitability as a force for perfecting human society. It seems sometimes as if the great intellectual shifts of the earlier part of this century - the recognition that "progress" is not inevitable and that technological developments are not invariably a force for good - have been forgotten. One of the most old-fashioned books of recent years must be Nicholas Negroponte's Being Digital, with its conviction that the best technology will always triumph, his conclusion that "Digital technology can be a natural force drawing people into greater world harmony," and his affirmation that, due to "the empowering nature of being digital," "we are bound to find new hope and dignity in places where very little existed before."

As Marc Bloch pointed out, of course we do not need to master Volta's ideas about galvanism to run a dynamo. However, in understanding the ways in which technological innovation can change society, a historical perspective is unavoidable. Of course, many discussions of digital technology contain some superficial historical references. The comparison with the introduction of moveable type (first made again by Babbage) has become wearisome, and is not usually put in a helpful way. What is interesting is not that the printing press helped spread new ideas, but that the process was more complex than might be at first imagined. The fact that the appearance of a printing press did not automatically lead to reform and revolution is evident from 17th century Russia, where printing was closely controlled by church and state, so that most printed books were religious and the manuscript remained the chief means of textual dissemination. When significant secular printing appeared at the beginning of the 18th century, it was at the behest of the Czar himself. A closer study of the effects of the introduction of printing might encourage a more subtle view of the likely impact of digital technology.

An exception to the non-historical view that is usually taken of digital technology is the brilliant work of J. David Bolter, particularly his book Turing's Man, which although now more than ten years old, is still the best study of the cultural impact of computers. Bolter argues that one of the defining characteristics of western society is that it is constantly in a state of technological revolution. Each epoch has its own defining technology, which permeates the culture of the period and opens up new intellectual perspectives. In the case of the ancient world, it was the spindle. In the sixteenth and seventeenth centuries, it was the clock. The case of the development of the mechanical clock is a useful corrective to some of the more sweeping claims made for digital technology. As it became possible to divide time into very precise units, "abstract time became the new medium of existence. Organic functions themselves were regulated by it: one ate, not upon feeling hungry, but when prompted by the clock; one slept, not when one was tired, but when the clock sanctioned it." These changes in perception can be seen as underpinning the thought of Descartes and Newton, with motion becoming a function of time and divided into arbitrary units. But equally it would be superficial to suggest that these changes were merely the result of the invention of mechanical clocks. The urge to develop clocks reflected the cultural needs of late medieval society and particularly the fact that christianity was a time-based religion.

There is, however, a very striking difference between the current technological revolution and earlier revolutions. This is the extent to which the likely future impact of the new technology has been exhaustively discussed and analysed. It is almost as if we are anticipating the changes in advance of the technology actually being available. This is a tendency that goes back to the earliest days of computerisation. Although Babbage's analytical engine never got off the drawing board, his own works and those of his associates such as Ada Lovelace and Dionysius Lardner describe the intellectual and social implications of this emergent technology with an acuteness which makes their work still very relevant. This is not simply because Babbage and the others were remarkable visionaries. It also reflects the fact that the analytical engine was part of a rigorously worked out view of political economy. As a result, Babbage's views on the social implications of computing are sometimes more incisive than those of contemporary commentators.

The computer is a child of the industrial revolution. The immediate inspiration for Babbage's inventions was a French commission set up under Baron Gasper de Prony to produce a highly accurate series of logarithmic and trigonometric tables. Mathematical and astronomical tables were essential aids to navigation and the prerequisite of much scientific work, but the production of such tables without significant mechanical aids was an enormous undertaking. Moreover, even the best tables contained many errors, partly due to arithmetical mistakes and partly because of the difficulty of typesetting and proofreading long numerical tables. Prony sought to apply the industrial method of division of labour to the production of tables. The workforce was divided into three groups. The first consisted of half a dozen eminent mathmeticians, who identified the formulae most suitable for calculating the tables. The second group contained analysts who assigned numerical values to the algebraic expressions in the formulae. These were then handed over to a third group of nearly a hundred people who undertook the routine arithmetical calculations.

Prony's method was for Babbage a revolutionary backthrough, since it showed, in Babbage's words "that the division of labour can be applied with equal success to mental as to mechanical operations, and...it ensures in both the same economy of time." Babbage realised that a logical extension of this intellectual industrialisation was that the work of the third group should be undertaken by machine. Moreover, if the machine directly produced the printed version, the main causes of error in the production of tables would be eliminated. This was the origin of the Difference Engine, which was really just a calculating machine. Whilst designing this machine, Babbage realised that the work of the analysts in coverting formulae to numbers could also be automated. This was the aim of the Analytical Engine, which would receive the formulae through punch cards. The flexibility of the punch card system permitted the combination of different operations, leading to a form of programming.

The aim of Babbage's computer was then the same as that of the spinning jenny or the power loom - to facilitate a more efficient divison of labour and use machines to perform repetitive work more cheaply and accurately than humans. This has remained the fundamental aim of computing ever since. In everything from databases to virtual reality, the starting point has been that, with a computer undertaking mechanical and repetitive tasks, thinking becomes easier and more productive. In one direction therefore the computer, by requiring stricter division of labour, promotes greater intellectual specialisation. However, unlike the spinning jenny, the computer is a general purpose machine, which can perform an enormous variety of tasks. It opens up new intellectual vistas and undermines the division of intellectual labour which it is intended to promote. There is perhaps a contradiction at the heart of computing, which recalls the contradictions in ninettenth-century capitalism.

This was evident to Babbage and his circle. Ada Lovelace couldn't resist pointing out that the Analytic Engine might be used to write music as well as calculate the Bernoulli sequence. Moreover, the Engine itself was a product of Babbage's wide-ranging interests, particularly the combination of his background as an academic mathematician and his profound interest in the new manufacturing technologies. Babbage was an opponent of narrow academic specialisation and criticised British scholars for their lack of interest in the industrial changes going on around them. Of all his works, the Ninth Bridgewater Treatise perhaps best illustrates Babbage's inter-disciplinary instincts. He uses the actions of the Difference Engine as a starting point for some novel theological insights (the famous idea of "God As Programmer"), goes on to demolish Hume's views on miracles using probability theory, and concludes with a series of wide-ranging appendices, one of which outlines the principles of dendrochronolgy, many years before this came into general use.

The theme of this symposium, an examination of the way in which the advent of the digital library can help draw together the sciences and the humanities, to my mind goes right to the heart of some of the chief philosophical issues associated with computing. It should not be taken for granted that the digital library will necessarily promote greater contact between disciplines. One of the prominent themes in discussion of the digital library is the difficulty of coming to terms with the vast quantities of information to which researchers will have access. A natural reaction in such circumstances is to strengthen the division of labour by focussing more intently on one's own special interests and not branching out into other areas. It has sometimes been argued that the use of computers by humanities scholars will inevitably promote closer contact with the sciences, because humanities scholars are using a technical tool. This is most eloquently put by David Bolter: "In the long run," he declares, "the humanist will not be able to ignore the medium with which he will work daily: it will shape his thoughts in subtle ways, suggest possibilities, and impose limitations, as does any other medium of communication." The extent to which this will happen, however, depends on the way in which the new medium is structured and the sort of facilities offered to the scholar. How far is current thinking about digital libraries likely to encourage the scholar to look beyond his own horizons?

The best summary of current work in this field is probably the special issue of the Communications of the Association of Computer Managers issued last April. This contains reports on most of the leading projects in the field, particularly those funded under the NSF Digital Library programme, together with articles summarising the main general issues. The impression given by this special issue is that only limited consideration is being given to the theoretical aspects of the digital library. The focus is on practical questions such as intellectual property issues, the best sorts of interface for handling large amounts of information, and the problems of storing and transmitting large quantities of textual and other information. If we take a Babbage-type view, and look at the way in which the digital library is affecting the division of labour, it seems that the digital library is currently only focussing on the lowest and most mechanical elements of library provision. (An exception to this, I should add, are the various electronic journal projects being funded under both the American and British programmes, which seem to me to be more innovatively exploring the ways in which the new media allow research to be presented in a way that is not possible in print.)

During the 1970s and 1980s, computers were used to streamline the most basic and repetitive tasks in libraries: the filing and maintenance of catalogues or the ordering of new books, for example. This allowed bibliographical research to be undertaken more quickly. In order to identify all the early German books in a particular library, for example, it was no longer necessary physically to read the catalogue. The development of networking allowed this information to be shared more readily, so that the need to produce printed or microfiche catalogues began to tail off. Again, however, this only affected the lowest levels of library work - the way in which remote enquiries are handled, for example.

The digital library takes these developments a little further, but not much. The ACM Special Issue emphasises that there is no common view of what constitutes a "digital library," but from the projects described there it seems that the digital library is mostly envisaged as a system for the holding of textual data in electronic rather than book form, on servers which can be accessed remotely. The digital library might also hold image, sound and video material, but these do not seem to have yet assumed great prominence in current projects. The technology for creating, storing and transmitting this text is not by and large very innovative. The chief creative focus of existing projects seems to be in the development of interfaces which allow the user to digest very large quantities of information. This might involve, for example, replacing traditional bibliographic interfaces with visual tools of various kinds.

The digital library is, as presently conceived, primarily concerned with applying automation to the storage, location and transmission of information - it is focussed almost entirely on making the stock control function more efficient. The assumption is that this will change the pattern of research, but I have doubts. In terms of a division of labour, the most basic and mechanical task of research is identifying the relevant literature and assembling it, by making notes, photocopying or whatever. The digital library will make this much faster and cheaper. It won't be necessary to trail from London to Edinburgh then Yale then Paris to undertake research, and all that copying won't be needed. Moreover, the researcher is more likely to be able to identify all the relevant literature. But will he then necessarily do anything different with the information he has gathered? The digital library will make research quicker, cheaper and much more efficient, but it won't radically change the way we use the information. What we are constructing at the moment is the Difference Engine. The interesting issue is what the next stage up, the Analytical Engine, will be.

One of the assumptions behind the present model of a digital library is that a library consists of texts, and that all a researcher wants is the information in that text. However, if one looks at a large research library like the British Library, it is evident that it consists of much more. When you enter the British Museum and turn into the Library's galleries, the first room is filled with one of the greatest collections of early printing, the Grenville Library. The next room, the Manuscripts Saloon, contains thousands of volumes of manuscripts. This leads you into the huge gallery containing the King's Library, a vast eighteenth century library. All this publically visible rare material is only the tip of the iceberg. The rare material in a large research library like the British Library comprises nearly a half of the total reference holdings. Researchers are interested in more than the textual information in these volumes. Their provenance, the way in which they were printed, the type of material they were printed on, their binding and illustration are all the subjects of intense scholarly scrutiny. My feeling is that the digital library will only start changing the nature of research when it helps scholars to approach these issues in a new way.

My introduction to the world of the digital library was working on the Electronic Beowulf project with Kevin Kiernan. Kevin's vision of this project was not simply that of making the Beowulf manuscript more widely accessible by producing digital images of the manuscript, but of providing a tool for more effective exploration of the manuscript. The digital images can be magnified to show a microscopic level of detail. There is the possibility of using image processing techniques to enhance and make legible damaged sections of the text. Hundreds of letters in the manuscript are concealed by Victorian conservation work. These can be read with the aid of fibre optic backlighting, but cannot be recorded with a conventional camera. With a digital camera however images of them can be made and then can be pasted in their place on the manuscript, so we can reconstruct how the manuscript looked a hundred and fifty years ago. Although the text of the poem is known from only one manuscript, there are a number of later transcripts and collations which contain important evidence for the state of the text. These are scattered across two continents, and by linking together computer images of them, the available evidence for the text of the poem can be more effectively analysed.

Kevin's inspiration came from medical imaging, and his work shows clearly how, by learning new experimental techniques from the sciences, humanities research can be taken into a new dimension. Moreover, the relationship can be a symbiotic one - materials such as those gathered for the Beowulf project can pose interesting questions for computer scientists. It might be objected that the Beowulf manuscript is a special case, but in fact many manuscripts pose analagous issues. The fire which damaged the Beowulf manuscript also affected hundreds of other manuscripts of fundamental importance for the study of English literature and history. The techniques of the Electronic Beowulf project will be very valuable in studying these manuscripts, and we are anxious to extend the project along these lines.

In almost all areas of humanities research, the computer can provide new ways of looking at old problems. I will give just a few quick examples, but the list could be extended almost indefinitely. I was talking to a papyrologist the other day. The study of papyri always seems to me like a vast jigsaw puzzle, with the manipulation of fragmentary and damaged scraps which often cannot be touched. The papyrologist told me that he now generally prefers to work with digital images of papyri, as they can be moved around and rearranged at will, and the ability to alter contrast levels makes the digital images easier to read than ordinary photographs. The same considerations apply to early oriental materials, and the British Library hopes to establish an "Electronic Asia" project to undertake such research on its large and internationally significant oriental collections.

Students of medieval drama have long complained about the "tyranny of the script." The focus has naturally often been on the manuscripts recording the words which were spoken by the actors, which means that performance details and the rich and complex iconographic context of the play, with its connections to a wide range of media from stained glass to woodcarving, is given secondary importance. Multi-media provides a way of returning the play to its context. The manuscript can be placed alongside videos of performances and the images which the medieval actors would have had in mind in interpreting the play. Professor Meg Twycross at Lancaster University is currently developing a project to explore these possibilities.

One early technical aid used in reading manuscripts was gallic acid, a chemical reagent which made a faded reading shine clearly for a few brief moments, then destroyed it. Thousands of manuscripts are now illegible due to the application of these reagents. They cannot be read under ultra-violet or infra-red light, and computer imaging offers at the moment the best possibility of recovering these readings. We have an illegible Magna Carta in the Library on which we've tried some of these techniques. Computer imaging is not just of interest in the study of manuscripts. It can be valuable in handling three-dimensional objects. High resolution digital images of seals and coins allow high magnification and offer the possibility of studying details which are invisible in conventional photographs.

I could go on all morning describing the possibilities, but I think I've made the point. The examples I've given are mainly manuscript ones, but similar illustrations could be just as easily piled up for printed materials. In work like this, we are seeing the beginnings of what Seamus Ross has called "digital research." In order for digital libraries to change the nature of research - to be analytical engines rather than difference engines - they need to be research libraries as well as access libraries. Almost all the humanities disciplines depend on what Marc Bloch has called "the tracks of the past," which range from manuscripts and printed books to pots and jewellry. Humanities scholars need to squeeze every piece of information we can from these remains - they are our experimental materials. The computer provides an exciting new tool for this, but we need the help of our scientific colleagues in developing new techniques in doing this.

However, it might seem that there will always be a difference between science and the humanities in that what I have been describing is primary level research. Although projects like the Beowulf project may make us more aware of the nature of the evidence on which more synthetic interpretation depends, it might be felt that interpretative studies will always be text-based. Can literary criticism ever be expressed in anything other than a text? I suspect that, as we move into a more visual environment, that might change. Suppose for example one was trying to define the causes and effects of a historical event, that might be more effectively done using a visual model than through text. If one was undertaking a regional study like Ferdinand Braudel's Mediterranean World now, the insights that GIS systems would bring to bear would certainly be relevant. Our view of the library, and indeed of the digital library, will have to shift as we see the emergence of secondary literature using a variety of media which may be recorded not in single bibliographic units but in much looser configurations.

Which brings us back, finally, to Babbage. In the Ninth Bridgewater Treatise, Babbage pointed out how a knowledge of mechanical laws gives you a different view of the world. When you speak, the waves spread out, gradually losing strength and impetus, but still remaining, until the only trace is perhaps in the movement of molecules, but still there. Likewise, the cries of a drowning man would create sound waves which would spread out through the water, until only the water atoms retained the impression of them. With a sufficiently powerful computer, Babbage speculated, you might be able to detect those faint traces and recover the last words of the dying man. This would be true of any object - the Beowulf manuscript would retain the faint impression of the conversations the scribe had while writing it. That is the meaning of the phrase I suggested to Kevin as a motto for this conference, and which Ackroyd also uses in his novel: "Every atom, impressed with good and with ill, retains at once the motions which philosophers and sages have imparted to it, mixed and combined in ten thousand ways with all that is worthless and base. The air itself is one vast library, on whose pages are for ever written all that man has ever said or woman whispered." Babbage's notion that you could recapture those words must have seemed bizarre in 1837, but in these days of chaos theory it seems less strange. Perhaps one day we will hear the Beowulf scribe speaking. There is certainly a challenge there which I think we should take up.