Designing digital libraries for post-literate patrons

Peter J. Nürnberg, Erich R. Schneider, John J. Leggett
Center for the Study of Digital Libraries, Texas A&M University, USA
email: {pnuern, erich, leggett}@csdl.tamu.edu

Abstract: Many Web and Internet technologies have traditionally been used to serve information stores across machines and between people. There has been a great deal of recent interest in using these information services to support digital libraries. Digital libraries is an interdisciplinary research effort that must synthesize existing research from highly disparate fields. This paper examines two such contributing fields - information systems and orality-literary studies - and applies them to a particular digital library domain, botanical taxonomic work. In trying to build digital libraries for botanical taxonimists, we show how two widely differing fields each can provide part of a solution neither can provide alone.

0. Description of the Paper

Many Web and Internet technologies have traditionally been used to serve information stores across machines and between people. There has been a great deal of recent interest in using these information services to support digital libraries. Research in digital libraries, as any interdisciplinary endeavor, is confounded by the fact that one must consider and synthesize from several fields, including ones that are perhaps unfamiliar. This paper seeks to tie together two very different fields - information systems and orality-literacy studies - that each have something to offer the digital library designer. The authors have chosen an unconventional format for presenting this material. The paper contains two threads for the two fields it draws upon. Some sections belong in only one thread, while others belong in both. The paper can be read in different ways, but most people will find it easiest to read the thread with which they are more familiar first, in order to contextualize the material, and then delve into the other thread. Figure 1 below illustrates the organization of the paper. (Note: the information systems thread should be read from the left column and the orality-literacy thread from the right).

0. Description of the paper
IS1. IntroductionOL1. Introduction
IS2. HOSS ArchitectureOL2. Orality, Literacy, and Hyperliteracy
3. Botanical Taxonomic Scholarship
IS4. Technology ApplicationsOL4. Hyperliterate Work Practices
5. Conclusions
Figure 1: Organization of the paper

IS1. Introduction

For many reasons, archaic work practices of varying "inappropriateness" to modern scholarship linger on despite their known flaws. In information-intensive fields, considerable support for the development of new work practices can be provided by digital libraries and the technologies underlying them. In particular, advanced distributed, computationally-oriented hypermedia systems, with their capability to support more fluid information structures, have often been proposed for use in fields where the mutable cognitive artifacts that scholars employ are known to be poorly reflected in the static artifacts produced by pre-electronic work practices for pre-electronic distribution methods.

IS2. HOSS Architecture

HOSS is a computationally-oriented hypermedia system [Nürnberg et al. 1996]. It consists of a hyperbase layer, a structure processing layer, a metadata manager layer, and an application layer. Each of these will be briefly described below.

The main difference between HOSS and other hypermedia systems is that HOSS is an entire operating environment. It provides file system, memory management, and scheduling features. Other operating system functionality is provided by a SunOS 5.4 kernel. HOSS is best thought of as a hypermedia-aware operating system. An immediate result of this is that HOSS, as any operating system, admits an open set of application processes. Furthermore, just as all applications in a real-time operating system may take advantage of real-time awareness on the part of the operating system, all HOSS applications have immediate access to hypermedia functionality. The functionality of the hyperbase and (open) structure processing layer is available to all HOSS processes.

IS2.1 Hyperbase Layer

A HOSS hyperbase is a process with two threads: a Versioned Object Manager (VOM) and an Association Set Manager (ASM). The VOM acts as a client of some Storage Manager (SM) that exists outside of the hyperbase. The VOM serves simple object and composite object abstractions and provides full versioning support for both [Hicks 1993]. The ASM is implemented as a client of the VOM, mapping the VOM abstractions to structural entity abstractions called associations and association sets [Leggett and Schnase 1994; Schnase 1992]. Because the ASM is a client of the VOM, it inherits versioning support for its abstractions as well.

A HOSS hyperbase is conceptually similar to other hyperbase systems [Leggett and Schnase 1994; Schnase 1992; Shackelford et al. 1993; Schütt and Streitz 1990; Wiil 1993].

IS2.2 Structure Processing Layer

HOSS allows an open set of structure processors called Sprocs. All Sprocs are clients of the ASM. The difference between Sprocs lies in the kinds of structure they manipulate. A key aspect to HOSS Sprocs is that they abstract behavior from structure [Nürnberg et al. 1996].

One example of a HOSS Sproc is the Link Services Manager (LSM). The LSM manages "traditional" hypermedia structure - namely, inter-application linking structure. It provides functions to create, navigate, manipulate, and destroy structure between application data. In the case of the LSM, behaviors correspond to the semantics of particular navigational structure traversals.

The Taxon Manager (TaxMan) provides a second example of a HOSS Sproc. TaxMan acts as a client of both the VOM and the ASM, and serves taxonomic structural abstractions. These taxonomic abstractions are widely applicable. For example, botanical taxonomists use abstractions such as family, genus, species, etc. to classify plant specimens. Also, linguists develop linguistic taxonomies to represent the developmental histories of different languages.

Additionally, the TaxMan provides a number of standard computations over taxonomic structures (i.e. behaviors). Some examples of these behaviors include structure querying (e.g. find all family taxa that contain four genera with only one species each) and "structure collapsing" (e.g. collapsing species, subspecies, section, etc. into the genera taxa and transferring the associations between specimen data and these collapsed taxonomic levels to the genera.)

IS2.3 Metadata Manager Layer

Metadata managers are system processes that primarily serve abstractions to other system processes. They build the abstractions they serve from abstractions served by other metadata managers, Sprocs, and hyperbases. Metadata managers can be viewed as abstract data types, exporting data and functional abstractions.

IS2.4 Application Layer

Application processes are user processes familiar from conventional operating systems. The nature of these processes is open. One example of an application that has been built is a WWW Common Gateway Interface (CGI) [Berners-Lee et al. 1992] program that acts as a client to the TaxMan, allowing queries to be made over a taxonomic space, displaying the results, and allowing users to annotate the records displayed in answer to the query. Another example is a Motif/X [Nye 1988; Young 1990] client that allows graphic editing and manipulating of taxa.

IS2.5 Other Tools

A number of tools have been built for application, metadata manager, and Sproc construction [Nürnberg 1994]. The HCMT and HPMT toolkits provide certain process model and inter-process communication primitives. A tool called the PDC allows quick construction of servers by generating the necessary protocol libraries from high-level protocol specifications.

     

OL1. Introduction

For many reasons, archaic work practices of varying "inappropriateness" to modern scholarship linger on despite their known flaws. In information-intensive fields, the derivation of possible new work practices can be suggested by differentiating those aspects of current practice that are archetypic to the problem addressed from those artifactual to the technologies currently employed. In particular, orality-literacy studies are here proposed for this purpose in fields where the mutable cognitive artifacts that scholars employ are known to be poorly reflected in the static artifacts produced by pre-electronic work practices for pre-electronic distribution methods.

OL2. Orality, Literacy, and Hyperliteracy

Since the 1960s an interdisciplinary research area within the humanities known as orality-literacy studies has existed, concerned with differences in the modes of thought and expression exhibited by individuals in cultural situations which exhibit primary orality (where writing is not used as an adjunct to thought and memory) and those exhibiting pervasive literacy (where it has become indispensable for those activities).

OL2.1 Orality and Literacy

A seminal work in orality-literacy studies is Preface to Plato by classicist Eric Havelock [1963], whose starting point is Plato's attack on poetry in the Republic [Waterfield 1993]. Plato's proposal that poetry be banned from his ideal state, because it degraded the intellect, is found odd by many modern students of Plato. Havelock sets out to examine what this apparent oddity in the philosopher's thought implies about the cultural situation of Plato's Greece.

Havelock contends the extensive ground of common knowledge and worldviews required by classical Greek culture was encoded in the great poems of the time, most notably Homer's epics. To the ancient Greeks, these were a "tribal encyclopedia" of cultural ways and norms. Poetry was also well-suited to the problems of information storage in a non-literate culture, namely retention in living memory and content-preserving transmission [Havelock 1963]. In essence, recitation of the epics was able to induce in reciters and listeners an almost hypnotic state that assisted correct remembrance. It also encoded cultural knowledge situationally. Both of these were anathema to Plato, who was promoting reflective thought on the nature of abstracts. Plato's literacy allowed him to encode knowledge externally as a thing "in itself" and allowed him to examine concepts and their abstract structures without forgetting them. Thus, Havelock concludes, arises Plato's excoriation of poetry as education method, as inhibitor of abstract speculation on the nature of the true, good, and beautiful. For our purposes, we note that Havelock showed the consideration of ideas as eternal "things in themselves" is an artifact of literacy, not an archetypic aspect of thought.

OralityLiteracy
Ideas as... [Havelock 1963] properties of concrete situations abstract and eternal "things in themselves"
Socially relevant truths as... [Ong 1982] mutable objectsfixed objects
Language use as... [Ong 1982] requiring consideration of situationmanipulation of abstract placeholders
Table 1: Examples of Differences Between Orality and Literacy.

Among other artifactual properties of literacy (examined in another seminal work of the field, Walter Ong's Orality and Literacy [Ong 1982]) is the notion of written truth as permanent truth. Today, it is common for material to be written down and remain unchanged for extended periods of time. If that material had some veracity when it was recorded, we tend to regard its "truth" as a permanent property that can be redemonstrated at any time. This is not the case with orally transmitted knowledge, which cannot be "recorded" except in living memory. As a result, material for which there is no call is forgotten, and changes to the material that give advantage will occur. Revisionism is reality in primary oral cultures; the beliefs that the written retains its truth for all time and that, by extension, publication implies truth are artifacts of literacy.

OL2.2 Hyperliteracy

Many believe that we are entering an era where electronic tools for storing and manipulating information will be considered indispensable for everyday thinking and remembering. Douglas Engelbart [1963] expressed this belief when he described a "certain progression of our intellectual capabilities", from concept manipulation (manipulation of concepts in the mind alone) to symbol manipulation (expression of concepts through language) to manual external symbol manipulation (manipulating linguistic symbols using writing) and finally to automated external symbol manipulation (manipulation of symbols using computers). Engelbart's second stage corresponds with the concept of "primary orality", and his third stage with "pervasive literacy". We extend the concept of orality and literacy by positing a new property of culture, pervasive hyperliteracy or simply hyperliteracy, corresponding to Engelbart's fourth stage.

Why posit hyperliteracy? If we are indeed entering an era where automated external symbol manipulation tools have become prerequisites of serious thought, then the designers of such tools should be interested in which aspects of thought are intrinsic to language-using human beings and which aspects are products of the use of non-electronic writing, since some of the latter may decrease in strength or disappear altogether in the residents of this new era. As can be seen from the above, these artifactual properties are not trivial, and they are precisely the concern of orality-literacy studies.

3. Botanical Taxonomic Scholarship

A curious aspect of some scholarly work practices is that often, these practices are known to depend on false assumptions or over-simplifications of a problem. In some cases, such as in certain economic models, these false assumptions are taken as reasonable because they produce good results and make the models tractable.

In other cases, however, these false assumptions are simply products of tradition, based in part on artifacts of old technology and literate mindsets. We take as one very specific example our experiences with botanical taxonomists. For several years, we have worked together with botanists to build a digital library of herbarium collection data. We have been able to observe several common current work practices that have changed as our botanist colleagues both gain access to new technology and re-evaluate those parts of their old technology that dictated how they did their jobs. As a particularly good example of a current work practice dictated by current technology, consider that there are journals that use taxonomies that everyone (including the journal editors!) acknowledges are outdated. The editors of the journal, however, are reluctant to correct the errors in this standard taxonomy, partly because the fixes are not universally agreed upon, but also because changing the taxonomy now would "invalidate" articles just published. The current common practice, then, is for researchers to carry out their work using a more realistic taxonomy, and then literally "uncorrect" their terms to match the journal standard.

For reference, the object of taxonomic classification is the taxonomy, which consist of taxa, which themselves consist of other taxa or specimens. Taxa are composed in a hierarchic fashion. Taxa at different levels in the tree have different names, such as family, genus, species, etc. We briefly describe three interesting problems we observed the taxonomists encounter in their current work practices.

Different groups of taxonomists produce different taxonomies, even if the specimen set examined is identical. Groups in which particular specialists work on a given taxon may show more detail in the expansion of that taxon, or different groups may use different measures of similarity when composing taxa, weighting various kinds of evidence differently. It seems contradictory to have multiple solutions to a classification problem.

Separate taxonomic groups produce separate taxonomies, which are then identified by the groups that produced them. This is despite the fact that it may always be used in conjunction with other taxonomies, or that it is based on the prevailing attitudes in the community. It seems contradictory that a communally defined, communally used product is identified with a small set of taxonomists.

The products of the work are often taxonomies, not simply revisions to existing taxonomies. Whether updates or new full revisions, the products are viewed as a closed, well-defined entities, representing an opinion of a group at some time. However, new evidence, new analysis methods, and new interpretations are constantly being introduced. It seems contradictory to produce a well-defined, static analysis of an ill-defined, dynamic phenomenon.

IS4. Technology Applications

Addressing the three examples of seeming contradictions in current work practices requires different supporting technologies than those present in the physical library. What is required here are new digital library elements and tools, not derived from physical antecedents. Of course, it is impossible to say what all of these technologies will be. This section outlines some possible technologies to begin to address these issues.

IS4.1 Single/Multiple Taxonomies

Two important capabilities that helps address single/multiple taxonomies problems are structure management and versioning. Hypertext structure management abstracts the structure over objects from the objects themselves. Oftentimes, this takes the form of abstracting traversal or navigational structure from data to be navigated. However, the principle of structure abstraction can be applied to any realm in which multiple structures may be applied to a given data set. This is precisely the case in taxonomic work. Different taxonomies (structures) are built over the same specimen (data) set. Because the TaxMan inherits the structure management abstractions of HOSS, including contexts (sets of structure elements and their associated behavior processes), it can use these contexts to partition the taxonomic data into consistent taxa sets.

Because the TaxMan is implemented on top of HOSS, it inherits the versioning support for both data and structural objects therein. This provides a natural way to model difference over time in a given taxonomy, as well as differences with respect to authority in the same time frame. Additionally, changes in the analysis of specimens (perhaps the addition of new pictures or new genetic information) can be added to the data set by versioning the appropriate specimen data object, thereby not invalidating taxonomies based on the older version of the object.

IS4.2 Ownership of Taxonomies

One important capability that helps address ownership of taxonomies problems is annotation support. An important aspect of maintaining and using community objects is annotating and sharing annotations over community objects. Such annotations can be used to judge the communal level of acceptance of a part of the community body of knowledge or other particularly noteworthy aspects. Moderately sophisticated access control, search facilities and filtering mechanisms over the annotation space should be provided. We have developed a HOSS Sproc named AnnoMan, which models sets of annotations as structure contexts and provides these features. Modeling annotations as structure in a hyperbase is straightforward - different structural elements (annotations) are laid over existing taxa and specimens (data), grouped into contexts, and managed by existing hyperbase software that can provide access control.

IS4.3 Definition of Taxonomies

Another important capability that helps address definition of taxonomies problems is computation over hypermedia structure. The nature of the information in taxonomic research may be open in the sense that the boundaries around it may be hard to define, especially outside of a particular context. However, dealing with documents that exhibit no sense of closure at all can be disorienting as well. What is needed is a way in which the open space can be viewed as only "partially" open - that is, enforcing some sort of boundaries appropriate in a context, but allowing these boundaries to be crossed or recomputed. One way in which to do this is to take advantage of computation over structure which dynamically generates closed sets of structure appropriate for a particular use.

     

OL4. Hyperliterate Work Practices

Addressing the three examples of seeming contradictions in current work practices requires different artifacts than those present in the physical library with its literate artifacts. What is required here are new digital library elements and tools, not derived from physical antecedents [Nürnberg et al. 1995]. Of course, it is impossible to say what all of these artifacts will be. This section outlines some possible artifacts to begin to address these contradictions.

OL4.1 Single/Multiple Taxonomies

One artifact of literacy is the notion of single-valued, static truths [Ong 1982]. The work practice of developing and publishing taxonomies separately from one another is a particular instantiation of this artifact. The product of this work is a taxonomy, a "taxonomic fact" or truth, presented and interpreted as such. However, the notion of truth is changing from the literate view of static and single-valued to the hyperliterate view of dynamic and multi-valued. Consider the Guides project approach to teaching history in which various persona contextualize history from a particular point of view [Solomon et al. 1989]. The "truth" of the matter is a space, in which various points of view are represented. This contrasts sharply with the notion of the authority of the book as conveyor of a single, coherent message as in the literate world [Chartier 1994]. Perhaps instead of viewing the primary goal of a taxonomist as the generation of a new taxonomy, which then must be related to previous and competing taxonomies by the consumer, the product may be viewed as a change to the existing body of knowledge. In fact, in essence, taxonomists do view the purpose of their work in this way, but the actual product of their work, the printed taxonomy, is only a means to this end. Reconciliation and contextualization is the responsibility of the consumer.

OL4.2 Ownership of Taxonomies

Literacy promotes the concept of idea ownership by the individual, even when the idea represents a communally held truth. In this case, taxonomies are identified with their producers or publishers. There is no way to recognize the contextualization of a taxonomy in itself. However, the notion of authorship is changing from owner of a document and by extension its ideas to recorder of ideas that are the product of several people, past and present. Consider an analogy from the business world - the growing role of the analyst [Reich 1991]. The analyst provides a filtering or ordering function for data that is oftentimes already available. Many new companies focus no longer in the production of information, but its compilation. This reflects a situation in which the problem of information is what to do with the overabundance of it (the "information explosion"), and not how to find and retrieve data [Chartier 1994].

OL4.3 Definition of Taxonomies

One artifact of literacy is closure of ideas. The product of taxonomic work is a well-defined, discrete entity. Products no longer must be closed. They may exist as changing entities over time, with poorly defined borders. Consider Web sites with links to many other sites. These sites have no closure per se. Where one chooses to draw boundaries is contextually and individually defined. This is in opposition to the closure engendered by books and other written entities [Chartier 1994]. As above, one new possibility is a communally maintained set of taxa, with various notes, modifications, and addenda separately maintained over these taxa. The boundaries of the communal knowledge could only be determined by a given consumer at a given moment.

5. Conclusions

The information systems thread of this paper asserted the existence of new work practices in botanical taxonomic scholarship enabled by new technologies. The new work practices, however, were assumed to arise spotaneously due to problems found in current work practices.

The orality-literacy thread of this paper motivated why certain new work practices might arise in botanical taxonomic scholarship, but did not offer any particular ways to cope with them.

The digital library will have to support the new work practices of people. The changes in such practices must be identified. We extended orality-literacy to hyperliteracy in an attempt to characterize the changes. The new practices will have to be supported by new technologies. We showed systems and tools able to support the needs of one particular research community. The threads in this paper, therefore, must rely upon one another, one for motivation, the other for prototypic solutions. We see this as a microcosm of the digital libraries research field - a field in which results from many different and dissimilar areas will need to be synthesized to produce the research necessary to redesign the tools with which people think.

6. References

[Berners-Lee et al. 1992] Berners-Lee, T. J., Cailliau R., Groff, J. F., Pollermann B. (1992). World-Wide Web: The information universe. Electronic Networking: Research, Applications and Policy 2 (1), 52-58.

[Chartier 1994] Chartier, R. (1994). The Order of Books: Readers, Authors, and Libraries in Europe Between the Fourteenth and Eighteenth Centuries. Stanford, CA: Stanford University Press.

[Engelbart and English 1968] Engelbart, D. C., and English, W. (1968). A research center for augmenting human intellect. AFIPS Conference Proceedings, 1968 Fall Joint Computer Conference, San Francisco, CA.

[Havelock 1963]. Havelock, E. (1963). Preface to Plato. Cambridge, MA: Belknap Press.

[Hicks 1993] Hicks, D. (1993). A Version Control Architecture for Advanced Hypermedia Environments. Ph.D. dissertation, Department of Computer Science, Texas A&M University.

[Leggett and Schnase 1994] Leggett, J., and Schnase, J. (1994). Viewing Dexter with open eyes. Communications of the ACM, 37 (2), 76-86.

[Nürnberg 1994] Nürnberg, P. (1994). Implications of an open, extensible, and distributed hypermedia information system architecture for inter-process communication subsystem design. M.S. thesis, Department of Computer Science, Texas A&M University.

[Nürnberg et al. 1995] Nürnberg, P., Furuta R., Leggett, J., Marshall, C., and Shipman, F. (1995). Digital Libraries: Issues and Architectures. Proceedings of the Digital Libraries '95 Conference, Austin, TX.

[Nürnberg et al. 1996] Nürnberg, P., Leggett, J., Schneider, E., and Schnase, J. (1996). Hypermedia Operating Systems: A New Paradigm for Computing. Proceedings of the Hypertext '96 Conference, Bethesda, MD.

[Nye 1988] Nye, A. (1988). Xlib Programming Manual for Version 11. Sebastopol, CA: O'Reilly and Associates Inc.

[Ong 192] Ong, W. (1982). Orality and Literacy: the Technologizing of the Word. New York: Methuen.

[Reich 1991] Reich, R. (1991). The Work of Nations: Preparing Ousrselves for 21st Century Capitalism. New York: A. A. Knopf.

[Schnase et al. 1993] Schnase, J., Leggett, J., Hicks, D., Nürnberg, P. and Sanchez, J. (1993). HB1: Design and implementation of a hyperbase management system. Electronic Publishing: Origination, Dissemination and Design, 6 (1), 35-63.

[Schütt and Streitz 1990] Schütt, H., and Streitz, N. (1990). HyperBase: a hypermedia engine based on a relational database management system. Proceedings of ECHT '90, Versailles, France.

[Shakelford et al. 1993] Shackelford, D., Smith, J., and Smith, F. (1993). The architecture and implementation of a distributed hypermedia storage system. Proceedings of the Fifth ACM Conference on Hypertext (Hypertext '93), Seattle, WA.

[Solomon et al. 1989] Solomon, G., Oren, T., Kreitman, K. (1989). Using Guides to explore multimedia databases. Proceedings of 22nd Hawaii International Conference on System Sciences, Kailua-Kona, HI.

[Waterfield 1993] Waterfield, R. (trans.) (1993). The Republic of Plato. New York: Oxford University Press.

[Wiil 1993] Wiil, U. (1993). Experiences with HyperBase: A multiuser hypertext database. SIGMOD RECORD, 22 (4), 19-25.

[Young 1990] Young, D. (1990). The X Window System: Programming and Applications with Xt. OSF/Motif Edition. Englewood Cliffs, NJ: Prentice-Hall.

Acknowledgements

This research was supported in part by the Texas Advanced Research Program under Grant No. 999903-230.