Intellectual Realities and the Digital Library

Francis L. Miksa and Philip Doty

Graduate School of Library and Information Science, The University of Texas at Austin, Austin, Texas 78712-1276, {franmiks, pdoty}@uts.cc.utexas.edu

Abstract

The question is asked, "Why should a digital library be called a 'library'?" Three aspects of the traditional library as a collection of information sources in a place are examined in order to shed light on their meaning to a digital library.

The idea of a collection is examined from the standpoint of both pragmatic and necessary boundaries. The idea of information sources is examined from the standpoint of a source's "work" attributes and of the incommensurateness of such works in a collection. And the idea of the library in a place is examined from the standpoint of logical space. While no final conclusions are drawn, the three concepts provide a basis for considering similar issues in the digital library.

Keywords: Digital library, library, library collections, information resources, intellectual organization, Incommensurate data, work (intellectual entity).

1. Introduction

We begin with a question. Why should a digital library, or an electronic library, or a virtual library--for the purposes of the remarks here these three terms will be considered synonymous--why should such a phenomenon be called a "library"? A digital library might well be called something else--a digital information system, or a digital publishing system, to name two possible alternatives. But such alternatives have not been chosen. Instead, "library" has been the term of choice. And this choice has been made not by librarians (who might have been expected to choose it) but rather by computer and information scientists who have been in the forefront of the development of electronic information communications systems over the past three decades.

Now, the term library might seems like a natural choice. Then again it might not be an enduring choice in the same way that "horseless carriage" did not ultimately become the enduring term for the new technology, automobility. The purpose of these remarks is not, however, to quibble over a word, but rather to reflect on certain aspects of the term library that might appear to be instrumental in choosing it for this new technology, but which may well contain implications not intended. What we hope to do here is to explore the question of whether a digital library should be called a library at all (implying, of course, that it might actually be something else entirely) by examining particular traditional aspects of the term. However, in order to keep this discussion to manageable proportions, here we will comment only on those traditional aspects of the idea of a library which have to do with its internal or intellectual realities as opposed to its external or social realities. Topics to be discussed are highlighted in a statement which defines a traditional library, that is, that a library is a collection of information sources in a place. Section 2 below will discuss the idea of a collection; Section 3, the idea of information sources; and Section 4, the idea of a place.

2. The Library as a "Collection."

The traditional library has always been defined as a "collection"--an aggregation of informational objects. Usually, focusing on the idea of a collection is referenced to such things as its topical scope so as to determine what should be in it, or its proposed clientele so as to determine what access arrangements to it should be made. These are legitimate concerns, of course, but in quickly shifting to them we miss the most obvious thing that the idea of a collection conveys--its implied boundaries.

A collection implies boundaries in the sense that, while collecting at its root simply means gathering, the act of gathering implies discrimination, some objects being included and others excluded. We might suppose that this simply means that informational objects are included and non-informational objects excluded. But this is not the case at all.

In a traditional library objects are normally included or excluded first of all on the basis of pragmatic considerations. It has always been that way. Expense both in acquiring and housing informational objects and the difficulty of obtaining certain ones of them have always demanded limits to collections. And the purpose for which a library is assembled has meant that not all possible objects have been wanted in the collection. After all, even where the informational objects might be few for some given realm, it may not serve the purpose of the collection to include all possible things--items in an unreadable foreign language, items which are plainly redundant or derivative, items representing incompetence, and the like.

A much more trenchant basis for boundaries is not pragmatic, however, but rather that boundaries appear to be necessary by the nature of things. When a collection is assembled it represents a segment of what Patrick Wilson has aptly called the "bibliographical universe" [4] and which is defined elsewhere as,

an abstraction of the accumulated items or objects of recorded knowledge of humankind in all of their various forms--for example, in the form of books, periodicals, graphics (pictorial representations of one kind or another), sound recordings, motion pictures, electronic data compilations, business, governmental and personal records, and so on. [2]

The abstraction called the bibliographical universe has limitations, of course. For example, it includes only a single copy of each unique recorded knowledge object. And it necessarily excludes items which have become lost or which are intentionally kept from the public eye. The result is that not every imaginable object of recorded knowledge is included in the abstraction.

Even with such limitations, the bibliographical universe remains enormous. however. Think for a moment of the world's production of recorded informational objects even in a single year--perhaps a million new editions of books, countless periodical publications where even those recognized for scientific and scholarly merit number in excess of 150,000 titles, not to mention those which have ceased publication but are still useful. And these figures do not include serial publications of much more limited purview, government publications at any level, audio-visual items of all kinds, archival and other organizational records numbering in the trillions. To this we add a growing number of electronic sources, especially those representing visual images and multimedia (including transmission of satellite imagery). These may best be spoken of in terms of terrabytes of data and they are growing in numbers at exponential rates.

All of the foregoing is mentioned because the reality of a collection, at least traditionally, is that it necessarily represents only a very small segment of a gigantic whole. Access to these materials has been traditionally provided by creating multiple libraries or collections, where the expense and labor in assembling, describing, storing, and making them available is shared among countless people and organizations.

Within the context of a digital library, does the idea of a collection with both pragmatic and natural boundaries still hold? We ask this question because it does not seem clear in the literature if what is meant by the digital library is a series of such collections or one such collection. For example, at one of the exploratory seminars prior to the NSF Digital Library initiative, Michael Lesk suggested that one of the purposes of the seminars was to express the idea of a digital library forcefully enough so that Congress would invest monies for creating it--that is, for creating a digital library for the United States.[1] What possibly could this mean, however, if the very notion of a library implies boundaries of the kind described here? Is not the idea of a digital library also necessarily tied to the idea of boundaries as well? In short, is it not to be expected that there will be many digital collections (i.e., libraries) just as there are presently many traditional libraries, or does the idea of a digital library preclude boundaries in some extraordinary way? And finally, if boundaries are to be recognized, what would they be?

3. The Digital Library as a Collection of "Information Sources."

The traditional library ordinarily implies the idea of a collection of "information sources" which have the potential of informing in one way or another those who consult them. While the informative nature of an information source seems straightforward, there would appear to be more to the idea of such a source than its informativeness. One primary attribute among many that information sources have is that they have regularly been consulted and used in terms of their separate and unique identities as intellectual or artistic entities. And as separate and unique entities they are by nature highly incommensurate in the ways they assemble information, the reasons for assembling information, the intended audiences for the act of assembling information, and the like.

The idea that an information source has a separate and unique identity as an intellectual or artistic entity arises from the view that each such informational source has two different but interrelated kinds of attributes--those related to its content and those related to its container or medium of transmission [2].

Content attributes include such things as a source's intellectual form or genre, its topicality, its intended audience, and so on. But of special importance among content attributes are those which in library cataloging are clustered around the idea of the "work" or "works" an informational source contains. A work is an entity constituting the intellectual or artistic effort of an intelligent being in representing his or her knowledge. It is called a work in the same sense that one designs and creates, say, a sidewalk or a flower arrangement. Each of the latter is the result of a direct effort that begins with the mind. In the case of information sources the result is a discrete intellectual or artistic entity or product which has an intentional structure, a storyline, arguments, and so on, with an identifiable beginning, middle, and end.+

All information sources have the attribute of being a work in the sense spoken of here, although not all will have been expressed with equal amounts of intellectual or artistic skill and control. Indeed, there is enormous variety among them. For example, some works may represent the expressions of single individuals, but others will represent layers of participation by several persons as in the creation of a work by one or more persons, its translation by one or more others, its augmentation with illustrations or commentary and its editing by still others. In contrast, other information sources represent the "utterances" (metaphorically) of corporate agencies. Still others will represent the confluence of human and computer capabilities, as in a database software package. It will have been designed by a team of persons. But, afterwards someone using the package will use the package to formulate a specific database structure , someone else will input data into the structure, still someone else may well devise a report program for the data, and a report will be produced by the computational power of the computer which runs the program.

Likewise, there is immense variety in the way expression in information sources has been shaped, not simply in terms of intellectual structure, but also in the kind of expressive medium used--that is, whether the expression is textual and discursive, textual and elliptical (as in textual tables), numeric, graphic, composed in a special language such as mathematical or chemical notation) and the like. Indeed, some information sources may be distinctive by appearing to be combinations of many of these motifs or by appearing to be chaotic in their conception and execution, at least in terms of some standard of expression.

The intersection of all of these kinds of variation in information sources is so multifaceted, in fact, that it is fair to describe the whole universe of such sources (or even a collection of them) as highly incommensurate in the way they present information. In fact, outside of differences that owe merely to variations in the container in which information sources are found,[[daggerdbl]] it seems fair to suppose that all information resources represent unique variants in intellectual structure, formatting, expression, etc.

The reason for pointing out the "work" attributes of information sources as well as their incommensurateness is that much (though not all) information seeking is shaped by and inextricably connected to these attributes. Ever since information sources first came into existence they have regularly been consulted in terms of the intellectual or artistic entities they contain and in the light of the incommensurateness of their expressions of information. Indeed, a large portion of humankind's intellectual activity has been devoted to the task of intellectually grappling with such objects in these terms, often assembling them as families of items based on how they refer to one another, or because they refer to common ancestor sources, or because they contain common text upon which comments are made, or because they have common sources (including publishers), or because of any number of other attributes which show their "family" relationships as whole intellectual or artistic entities.

This aspect of the phenomenon of information sources in the traditional library is pointed out because of undercurrents found in discussions of libraries in electronic environments that speculate that the new technology will and should promote something altogether different both in the nature of information sources within the environment and in the nature of their use. These speculations stress the possibility of new kinds of information sources, sources which, for example, evolve through emendations and changes put forth by variety of successive participants and which, therefore, would seem to have few, if any, of the work attributes of information sources just described. Likewise, access to sources is occasionally described as a matter of hypertextual linkages so finely wrought and controlled that retrieval would resemble MEMEX-like navigation over bits and pieces of large numbers of sources with little concern for the individual work attributes of any particular one. Indeed, from the latter picture it would appear to be only a short leap to information sources that are all controlled as to their structure the better to access their parts. Or it might suggest a situation in which information sources are all entered into an electronic environment in such a way as to blend them together conceptually (much like Doug Lenat's CYC project at MCC) so that with a skillful search engine it would be possible and desirable for the electronic mechanism to respond to the inquirer much like a person answering a question.

The point to be made here is not somehow to deny such possibilities or to intone against them as "ought nots." Who knows what this new technology might become? Few in 1905 could have imagined what the horseless carriage has now become nor how over the years it would change the idea of personal and social mobility. Nor would it do to point out that endless and evolving information sources already exist (for example, in the form of legal codes) and that some forms of searching for information has always disregarded the work attributes of information sources. In the latter respect, for example, an administrator in charge of enrollment probably neither cares nor reflects on the fact that information related to the current enrollment of S. Smith comes from the structured digital entity called the school's "Student Database." Likewise, a person interested in the highest batting average in the American League in 1949 will not likely be too concerned that the information came from the World Almanac.

Rather, the point to be made is that the "work" attributes spoken of here have been integral to human intellectual endeavors ever since information sources came into existence millennia ago. Further, the idea of a library has been inextricably attached to the idea of providing access to such sources in terms of their attributes as works (among other things) in response to that intellectual pattern. Thus, while new possibilities may well become part of the scene, if a digital library is, in fact, to be a library, would it not appear that this aspect of a library must be an integral part of its conception? Or, is a digital library to be something else altogether.

4. The Library as a Collection of Information Sources "in a place."

A third aspect of the traditional library that merits our attention is the relationship it has to the idea of location in that a traditional library may be defined usefully as a collection of information sources "in a place." In fact, one commonly thinks of a library in terms of its physical and, therefore, its spatial location.

Here especially, one might think, is the point where the idea of the digital library really does distinguish itself from that of the traditional library. A digital library plainly does not need to exist in one place. It can be distributed over many different servers and clients in many different places. Nevertheless, there is an aspect of the idea of "the library as a collection in a place" that shows that an important factor about location which makes any given collection of information sources a library is not actually a place as a physical location but rather a place as an intellectual construct--a logical or intellectual space, if you please--where location implies a rationalized set of relationships imposed on the members of the collection.

It is precisely this aspect of a traditional library that ties its divergent elements together as an integrated entity and makes it more than merely a loose assemblage of items.

It is the reality of location as a logical or intellectual space which makes doubtful whether even well-designed consortia and systems of separate collections are libraries in and of themselves. It is also this same sense of logical or intellectual space which certainly contrasts with something as pervasive as Internet gopher space today. In the latter, one is faced with a huge variety of useful sources not tied together as a single intellectual construct, neither in the sense of structure nor in the sense of access methods. It is this reality that makes us readily conclude that gophers considered together do not make a library; that something more is needed.

The latter situation is not unlike the specter of the shopping resources of a large city, where even directories like the yellow pages cannot overcome the disparate nature of the way goods and services are organized, represented, and made available. Despite the friendly injunction to "let your fingers do the walking," an individual has to amass a large amount of personal knowledge about individual stores and agencies and the various ways they organize their wares and conduct their business in order to negotiate them successfully for even a single item.

One answer to the same confusion with respect to information sources has always been to make a library of them, where the idea of the library includes the construction of a set of arrangements that overcomes the disparateness of the individual sources by relating them to one another in terms of a single, operational, intellectually structured whole. This reality is one of the inventions of the modern library of the past century or so, and, regardless of how well or poorly we might think the result has been, or how relevant or irrelevant, the lessons of that institutional experience are highly instructive.

The original vision of the library as propounded by nineteenth century pioneers like Melvil Dewey and Charles A. Cutter (followed by others equally or even more notable in the twentieth century) was more than simply a set of pragmatic devices such as catalogs, classification systems, and reference desk procedures. It began in reality with a strong (as opposed to a weak) view of the cohesive and interrelated nature of knowledge itself, of humankind's accumulated social knowledge. To these pioneers, organizing information sources into a cohesive intellectual structure, regardless of the form of the structure, was derivative of that preliminary vision of knowledge. Their efforts were shaped by what they assumed about that knowledge structure. When implemented in the form of bibliographic control practices, that same structure provided a pathway to humankind's social knowledge.

There can be little argument that the systems these people created (and with which present-day libraries still contend) have severe limitations with respect to modern information needs. Some of the limitations have arisen from their assumptions about knowledge itself, not only in how they viewed its organization (i.e., linearly, in chiefly a two-dimensional hierarchical structure, with monothetic classes) but also that there was only one true way to organize it or that there was only one purpose for organizing (i.e., for document retrieval). Other limitations arose from the technology they had at their disposal (for example, single entry book catalogs strapped by printing cost limits, and card catalogs strapped by individual record space limits) or in how they applied the technology (for example, classification limited in application primarily to single-entry shelf sequences, at least in the U.S.). Still other limitations resided in inadequate ideas about users' habits in looking for information, including the failure to move beyond pre-coordinated exact-match equations of document representation and controlled vocabulary searching protocols.

These people did contribute two essential components to the idea of a library as a collection in a place, however. First, they focused heavily on the idea that what makes a library in large measure a library is the intellectual or logical space necessary to accommodate the information sources a library collects. Second, they appropriately assessed the reality that creating such a space will necessarily require a great deal of time and effort. The latter was not unlike creating a commercial empire or, in academia, writing a comprehensive treatise which as an introduction to a topic is so complete it is not likely to be surpassed for years to come. The result of their work was the rather incredible (when one thinks about it) possibility of being able to enter through the door of that agency called a library with a reasonable expectation that the information sources collected there have been organized into a sensible whole and that, even if the client does not understand the structure, it would provide a basis for finding sources that fit his or her need. In this respect the metaphor of the library as a door through which one goes to find an intellectually organized set of information resources is very provocative.

Here, as in the first two points of this paper, we ask the obvious question, that if a digital library is to be a library at all, must it not contend with this extraordinary need to create a logical space, one that in reality accommodates the boundaries among individual information sources as works, including their incommensurability? There has been talk of extraordinary solutions to some of these problems in the new electronic environment--knowbots, for instance, which can search for information in hyperspace apart from any organization of that space, or gigantic parallel processing mechanisms that would obviate the initial natural act of information searching which is the exclusion of searching routes, the latter commonly in terms of semantic relationships inherent in the idea being focused on and in the light of the structure of knowledge in general. Such alternatives may ultimately become the appropriate way to proceed, of course. But then, would not the result appear to be something different than what is understood to be a library?

References

[1] Lesk, Michael. 1993. "The Digital Library: What is it? Why Should it be Here?" In Source Book on Digital Libraries. E. A. Fox, ed. Blacksburg, Va.: Department of Computer Science, Virginia Tech University, TR 93-35. (Print version of electronic file.)

[2] Miksa, Francis. 1994. "The Universe of Knowledge, the Bibliographic Universe, and Bibliographic Control." Ch. 1 in Library Cataloging and Bibliographic Control. Austin, Tx.: Ginny's Copy Service.

[3] O'Neill, Edward T. and Vizine-Goetz, Diane. 1989. "Bibliographic Relationships: Implications for the Function of the Catalog." In The Conceptual Foundations of Descriptive Cataloging. San Diego: Academic Press, 167-179.

[4] Wilson, Patrick. 1983. "The Catalog as Access Mechanism: Background Concepts." Library Resources and Technical Services 27, no. 1 (Jan/Mar): 4-17.


+ We exclude here for the sake of simplicity the phenomenon of an "incomplete" work which occurs because its creator was unable to complete it or uninterested in doing so.

[[daggerdbl]] The Expedition of Humphrey Clinker, an epistolary novel by Tobias Smollett first published in 1771, has in excess of 100 different records in the OCLC database [3], but it is doubtful that they represent more than one intellectually unique structured entity or work. Different editions have arisen mainly from republication in different physical formats. Last Modified: