Making Large-Scale Information Resources Serve Communities of Practice

Catherine C. Marshall, Frank M. Shipman III, and Raymond J. McCall

Abstract

Community memory can provide the crucial bridge between large-scale information bases like digital libraries and the day-to-day activities of a community's members. Just as a digital library is based on a general structure and conventional means of access to diverse collections of materials, a community memory will help cull and shape the structure and contents of this collection to meet more particular needs. But it is by no means straightforward for people to collect, maintain, share, and apply the materials that are part of a community memory. Useful and usable community memories require support for: (1) the acquisition and evolution of content and structure; (2) the identification of materials and community members relevant to a particular task; and (3) the maintenance of organizations that are mutually intelligible across the community. In this paper, we explore issues related to these three requirements based on a meta-analysis of our collective experiences with the development and use of shared hypermedia information resources.

Key words and phrases: community memory, electronic community support, hypermedia, information spaces, digital libraries, collaboration, shared understanding

1. Introduction

In principle, digital libraries and large-scale information bases will provide physically distributed electronic communities access to a broad spectrum of archival materials, including those that we currently find in public, community, and work group repositories. But how will these communities bring ever-increasing electronic resources to bear on their work? How will people use digital libraries in their day-to-day activities? How will they apply these emerging collections to information-intensive intellectual tasks -- research, design, education, analysis -- work that requires information to be gathered, understood, and communicated to others? To make this wealth of on-line resources truly useful to emerging electronic communities, we must forge a link between the distributed repositories and the practices of information workers who will use them.

Community memories will form this vital link between large-scale collections and information-intensive work. Just as a digital library will provide a general structure and means of access to a collection of materials, a community memory will enable this library to serve the actual day-to-day needs of community members. It culls and indexes the library, evaluating and interpreting its contents according to those needs. In fact, the creation, maintenance, and use of this memory as a shared resource largely defines the community and constitutes the central vehicle for communication among community members. By augmenting members' abilities to perform tasks, use of community memory may provide the central motivation for membership in a community.

1.1 Community memory and electronic communities

When people work together -- whether they are designing a product, or creating training materials from video-based documentation, or writing a coherent analysis of a complex situation in the world -- they require, and put effort into constructing and maintaining, shared understandings of what they are doing: the task, the pertinent body of material, preliminary findings, progress, and methods. We refer to the open-ended set of collective knowledge and shared understandings developed and maintained by the group as community memory.

Electronic communities are different than physical communities. They are ephemeral, forming and reforming according to interests, particular tasks, or issues. A person may be a member of many electronic communities, shifting his or her virtual presence from one locus of activity to another with ease. Our experience with electronic communities has drawn us to investigate ways of supporting these communities in the same medium in which they have formed.



Figure 1. A conceptualization of community memory


Figure 1 illustrates our conceptualization of community memory in a distributed electronic community. Materials are drawn from many repositories, some archival, some transient (like wire service stories, electronic mail, and listserv contributions) and, through discourse and interaction, combined with artifacts relating to the task at hand to form the shared understandings that are community memory. Shared understandings in turn become the basis for communication and further work. Thus community memory may include discourse, collected materials, answers to frequently asked questions, evaluations of these materials ("This is an important article") or sources ("This news group has valuable information; this other one is a waste of time"), as well as marginalia and annotations, alternative organizations of materials, filters, and well-tuned queries.

1.2 Electronic community memory at work: early examples

We can already see community memory at work in on-line communities on the Internet. Beyond providing access to distributed computing resources and remote information, the Internet has proven to be a particularly effective vehicle for human-human communication. It is the means by which electronic communities form and transient collections of materials grow in association with a task or topic. These collaborations, and their associated community memory, have the capacity to greatly extend the reach of the individual.

Augmenting the intellectual capacity of the individual through support for collaborative construction of knowledge is not a new idea; Engelbart saw such a potential with Augment [5]. Currently, this phenomenon is an important side effect of having the infrastructure in place to provide communities with extensive digital resources and improved connectivity [25].

For example, in recent years researchers at different sites have met using this infrastructure as a means of organizing their conferences; they have discussed individual papers, the program, and decisions about conference content while referring to a set of submissions, and implicitly, to the body of literature in the field. In effect, through their conversations, they have formed a shared understanding about the current state of the field. In other cases, a topic (rather than a task) helps maintain a community: for example, high-energy physicists exchange preprints through the World-Wide Web as a way of shortening the review and publication cycle.

NSF Collaboratory projects have acknowledged the feasibility of distributed communities in information-rich domains collaborating on line. Shatz's account of the worm community discusses one such experience, including the characteristics of an on-line community that make it amenable to this approach, and the kinds of formal and informal artifacts that the worm community found it valuable to collect, organize, and annotate [24].

1.3 A spectrum of resources

Physical collections exist to serve communities of many different sizes -- from a few to thousands of members -- and for every variation along this range, a community memory can and often does exist. Even individuals frequently have their own external memories in the form of highlighting and annotations scribbled on the pages of the books and articles in their personal collections. A group might collect a variety of formal materials like specifications and informal materials like email and meeting notes to build up a project notebook. Patent attorneys working for a large company might enhance their understanding of documents retrieved from a formal intellectual property database with informal videotaped interviews with inventors. Network-wide community interactions over collections might result in something like World-Wide Web home pages, where individual members of the community increase the value of existing materials by creating their own links to "useful resources" or "favorite starting places." Notice that in each of these cases, community memory acts not only as a filter, but more importantly as a superstructure to the more general information resource.

Prospective developers of digital libraries are planning to include community memory-like facilities to make tomorrow's electronic resources better serve the distributed communities that use them in their work [9]. Proposed digital library functions include: guided tours and automatically recorded reader paths; the ability to self-publish and move material in and out of the library at will; sharing of annotations; and voting schemes (as illustrated, for example, by Goldberg et al.'s Tapestry system [10]). Digital media and improved network connectivity make it much easier to collect these superstructural elements that rest on top of large-scale information resources.

Community memory introduces an important difference between the underlying information resources and the superstructure we discuss. While both physical and electronic information resources rely on the existence of fixed roles like publishers, librarians, readers, and writers [22], this kind of external memory generally implies far more fluid roles. At any given moment, a member of a community may act as contributor of information to the community memory, as recipient of information from it, or as interpreter of information in it. In the following sections, we take each of these roles as a separate vantage point and examine the issues and challenges each role raises.

1.4 Challenges

We thus find community memory to be a linchpin to the effective performance of intellectual work. But our thirty or so years of collective experience developing and deploying systems to support the elicitation and reuse of community memory, augmented by reflections on our own experiences with network-wide collaborations have shown us that there are significant barriers to realizing a fully articulated, well organized, usable electronic community memory.

Building useful and usable memories for distributed communities presents fundamental challenges [32, 2]. Although it appears to be easy for a group to amass the kinds of materials that are part of a community memory -- for example, electronic mail, culled, annotated library materials, "war stories" about how prototypical problems were solved in past situations, software that embodies a particular way of processing digital library information, or videotapes documenting design meetings -- it is still problematic how to put these materials to productive use over time.

Community memories need to be seeded, maintained, and generalized; they need to reflect the evolution of shared understanding. Members of the community must be mutually aware of each other's contributions, and the contributions must be mutually intelligible. Effective community memories cannot exist in isolation either from the tasks at hand or the information resources they refer to. Finally, and most crucially, they have to be useful to the members of the community: they must contribute directly to the work activities.

2. Community Memory: Issues of Contributing, Obtaining, and Using Interpretive Information

We are looking at community memory as a shared interpretive layer on top of sifted subcollections that refer to materials taken from both within and outside of digital libraries. To provide technological support for community memory, we must examine the situations from which it arises, and the challenges associated with our collective set of experiences designing systems to support community memory.

How do people use community memory as a resource for performing intellectual work? First, they find the materials they need for their work (many times by consulting colleagues, assistants, librarians, experts, and other human resources); they read or otherwise apprehend portions of materials they've gathered; finally, they modify these materials to suit the purposes at hand, where modification may include synthesis of diverse sources, paraphrasing, quoting outright, or using the gathered information as a taking-off point. Thus, to perform information-intensive intellectual work, a member of a community will contribute as well as extract.

We will take each of these roles as a separate vantage point, and examine the issues and challenges raised by each. Because the materials must be in place before they may be used, we first take the contributor's perspective. Once these material exist as a community-maintained electronic resource, we can begin to examine how people locate the relevant portions. Finally, we take a reader's view -- how the materials may be understood outside of a prescribed, pre-defined structure. We ground our discussion in smaller-scale experiences with hypermedia systems, since hypermedia is a good representational medium for creating community memory. We use these experiences to make informed speculations about how the issues revealed by these systems in use scale to tomorrow's much more extensive electronic resources.

2.1 Creating and sustaining community memory: a contributor's perspective

Although it is easy to amass materials for community memory, it is difficult to provide the incentive to add the requisite organization that will make the shared resource useful to others [20, 4, 7]. In general, this difficulty is intrinsic to certain types of groupware: contributors' efforts may far outweigh the benefits they derive from the work [12]. Many existing efforts to provide group memory or support long-term community-wide discussion have found that without an individual's single-minded devotion to starting them, keeping them going, and maintaining them, the information space slowly dies and becomes irrelevant, even to its originators. It is difficult to ensure real, continuing participation as well as the casual browsing we might encounter today on the Web.

The difficulties of acquiring community memory are exacerbated by both technological and social factors. First, contributors often don't derive benefits commensurate with the amount of effort they expend: there is a large gap between the collected materials they've used in their work (their files, for example) and materials that have been organized so that others may profitably use them. Not only does the structure of these materials arise over time and in conjunction with particular tasks, but any additional structure brings with it a considerable amount of overhead [28]. Second, as a changing, evolving form, community memory requires continuing thoughtful maintenance to weed out growing inconsistencies and redundant contributions. Finally, community memory arises out of tasks that take place in a distributed, heterogeneous environment, one that involves paper as well as digital media (see for example the description of analytic work in [14]), multiple authoring tools, and many different collections of source material, retrieved from a multiplicity of information services, each with its own formats, access methods, and protocols; this blend of materials, media, and technology presents significant obstacles to the construction of a side-product like community memory.

2.1.1 Internal sources for community memory: emergence of structure through incremental formalization.

Through our work with representational hypermedia tools like Aquanet, VIKI, and HOS, we have shown that the groupware cost/benefit paradox may be amenable to solutions like methods to support the gradual emergence and evolution of structure and techniques to support incremental formalization [18, 17, 29]. These tools and techniques emphasize low-cost means of adding the kinds of structure that may organize information from a digital library into a community memory.

Aquanet [16] is a good example of a group tool that suffered from the cost/benefit paradox inherent to community memory systems. One of Aquanet's principle roles was to act as a collaborative front-end for the exploratory manipulation and organization of large collections of documents relevant to a particular task; in particular, we had hoped people would work together to create large, tightly interlinked structures of argumentation and evidence in the course of performing long-term analyses. These structures would encourage people to develop multiple interpretations of large collections of always-changing, possibly conflicting materials and would form a shared interpretive layer over institutional databases and commercial information services.

Aquanet provided specific support for users to create and manipulate complex graphical knowledge structures in the form of a schema editor for defining structured types of information objects and an infinite two-and-a-half dimensional information space in which to create and manipulate instances of these types. In our original conception of the knowledge structuring task, users would define graphical representations of the elements in their problem domain and specify (and constrain) all the ways in which these elements could be interconnected. They would then apply and change these structuring schemes or abstractions over the course of their tasks. Thus Aquanet was intended to provide a flexible way for people to record the abstractions they use to interpret information, to reflect and critique their analytic frameworks, and to explicitly negotiate about how information is structured (all crucial elements to a successful community memory).



Figure 2. A shared information space in Aquanet

Figure 2 shows a portion of a shared information space that was created in Aquanet, one constructed during the course of an evaluation of foreign language translation software and analysis of the field of machine translation. Each distinct visual symbol in the figure refers to an article from the trade press, notes about a software package, contact information for a company, or a label for other elements. Thus, schematic abstractions in this application included Systems, Institutions, Labels, and Articles (among others not captured in this framing of the space). But notice, the structural interconnections are absent from the picture; users relied on proximity and visual/spatial patterns in the layout to convey interrelationships. Much of the structure we would expect to be defined and created in the information has been left largely implicit.

If we consider Figure 2 as an example of superstructure, built on top of a large-scale information resource (in this case, primarily articles from a commercial information provider), the contents of the instances of Article types are the external material, drawn from the information provider's database. The other types of information objects (the notes about Systems and Institutions, and the Labels for the graphical layout) and the distinctive appearance of the Article type can all be thought of as part of the community memory.

This application of the Aquanet system illustrates a crucial point about anticipating a high degree of structure from contributors to a community memory. People find the definition, refinement, and use of sophisticated domain descriptions difficult, and insufficiently rewarding for the return. Instead, they will create a locally useful amount of structure -- in the application shown in Figure 2, it was useful to be able to distinguish source articles from notes, and to create parallel types of notes on each different kind of system -- and omit more formal definition of domain structure.

Thus, our experiences showed us that informal (and in this case, visual/spatial) representations are crucial to coaxing out partially formed, emerging interpretations. One of Aquanet's unexpected interpretations that were less than fully formed -- in terms of visual appearance or position in the shared space. Extra-linguistic means of expression proved to be vital, allowing categories to be created without labels and relationships between documents to be expressed visually; for example, sets could be created by putting references to particular documents close together, overlapping in the space. The kinesthetic process of "trying things out" (as one might do shuffling into papers in piles across every horizontal surface in one's office) was not eliminated because a person was using a computer instead of manipulable paper objects in the world.

How can we apply these lessons to develop a system to support a more informal, emergent facility for contributing to a community memory?

Out of our experiences with Aquanet, we designed VIKI, a tool to support emergent, dynamic, exploratory interpretation [17]. VIKI supports the ad hoc use of a visual symbol language so people can see and express structure as it becomes apparent to them. In contrast with Aquanet, developing this language is well-integrated with the task at hand. Because interpretation -- along with the concommitant act of organizing materials -- is opportunistic, users are not confined to a particular working style; they may work from gathered subcollections from an existing information resource to develop structure, they can work schematically (the mode Aquanet enforced), or they may leave structure and meaning largely implicit. VIKI complements the ability to develop abstractions and reflect on and critique interpretive frameworks with the flexibility offered by ad hoc, visually salient representations. We see support for emergent structure as a partial solution to the cost/benefit paradox inherent in computer support for community memory.



Figure 3. Portion of a shared space in VIKI

Figure 3 shows a small portion of a shared space that a group of contributors has created in VIKI. The space acts as a community memory for the group by providing a place to collect, structure, and annotate materials that pertain to the group's research activities. It includes internally-authored materials describing individual projects (a portion of this is visible in the figure), intellectual property documents such as Invention Proposals and patent filings connected with the projects, and papers and short articles about competing products and projects drawn from a commercial information provider's databases. In the portion of the subspace that is visible in Figure 3, the structures that people have built are notably similar to those shown in our Aquanet example -- visually salient, regular, but not wholly conforming to a pattern, a mixture of typed and untyped entities. The distinct structures in the figure are, in fact, labelled lists of projects. VIKI includes facilities for recognizing, using, and declaring this kind of implicit structure. Visual structure is built up and becomes the basis for sharing knowledge. By using heuristics for automated recognition of the same kind of structure that humans perceive in a spatial layout, we can support the gradual (and, from the user's point of view, cost-effective) emergence of structure.

Since it has become clear from our experiences that people within a small working community are capable of sharing implicitly structured material, and that, given the opportunity, they have difficulties making such structure explicit, why are these structured representations and the mechanisms to help people use them even necessary?

As we discuss later on in Section 2.3, structure helps keep community memories intelligible to the members of a broader community by giving them the means to understand how the contributions of others fit into the community memory. But more importantly, formal structure is also computationally tractable, raising the possibility of computer support for a community's activities. With the Hyper-Object Substrate (HOS), we have investigated the process of incremental formalization to support the emergence of structure. To this end, HOS integrates hypermedia and knowledge-based representations. Hypermedia eliminates many of the cognitive costs of formalization that inhibit user input. Integration with a formal knowledge representation reduces the burden of formalization by allowing it to be distributed and making it demand-driven [29].

To further lower the cost of formalizing information, HOS actively supports incremental formalization with mechanisms to recognize emergent structure implicit in the community memory and suggest formalizations based on this structure (a content-based analog to VIKI's visual/spatial structure perception). Experience with the use of HOS indicates some success and a greater potential for investigation of both methods for producing and interfaces to suggesting possible formalizations.

2.1.2 Lifecycle of community memory: seeding, evolutionary growth, and reseeding.

We have observed three major types of processes -- and stages -- in the life cycle of community memories: seeding, evolutionary growth and reseeding [8]. Seeding is the creation of the initial body of information in community memory. When this initial set of information reaches a certain size and level of relevance to the community, it starts to grow and evolve spontaneously as the result of additions made by its users (as has happened in the World-Wide Web). Seeding ends with the start of this evolutionary growth. After this growth proceeds for some time, the memory starts to become less and less useful; as a consequence, both use and growth may diminish. This happens for a number of reasons, such as growing disorder in the memory and the "needle in the haystack" problem -- i.e., the increasing difficulty of finding useful information in the growing information collection. At this point, the community memory must be revised -- i.e., reseeded. Its contents must be organized, winnowed, prioritized and generalized. The methods for locating things in memory may themselves need to be altered. If this reseeding is done successfully, the system can start another stage of evolutionary growth, after which it will in turn need to be reseeded if it is to continue to serve its users.

We have repeatedly experienced this three-fold process in our attempts to build community memories, for example, with large Issue-Based Information Systems (IBIS) structures [20]. Very few IBISs for groups have gotten started without the dedication of a single person or small core group of people who were willing to create the seed: i.e., the initial set of issues, positions and arguments. We have found that attempting to get the IBIS users themselves to invent -- out of the blue -- relevant issues, answers and arguments is a frustrating and generally unproductive experience for all concerned. Once there is some argumentative discussion for users to react to, the situation changes dramatically. It is easy to get people to react to what others have said, and the difficulty changes from trying to elicit information to trying to keep up with the information elicited. In our experience, this change makes it quite clear when the evolutionary growth stage of an IBIS has begun.

We found that as an issue base grows, its maintenance becomes increasingly difficult and error prone; growth leads to increasing disorder in the issue base. We also found that it became increasingly difficult to locate relevant information. These two problems have devastating synergies. For example, a given issue would often be raised and stored repeatedly, typically with slightly different wording. These redundancies were very difficult to detect, in part because of the difference in wording. Thus group discussion became fragmented into parallel discussions. As time went on the fragmentations grew in number and even compounded themselves -- with branches of the fragmented discussions in turn becoming fragmented. As a result, the IBIS increasingly ceased to function as a vehicle for group communication. To restore it to functionality, it was necessary to reseed the IBIS through a comprehensive edit of the issue base.

We have also observed this three-fold process in the creation and development of a number of large software systems, such as Symbolic's Genera, Unix and the X-Windows system. In such systems, after the creation of the initial versions of the systems (seeding), users developed ad hoc additions to system functionality and often shared these as a community (evolutionary growth). These additions were often winnowed, refined, combined and included in later official versions of the software (reseeding), after which they entered another stage of ad hoc additions to functionality (evolutionary growth).

2.1.3 External sources for community memory: connections to information resources.

Attempts to create community memory seem certain to fail if the memory is not connected to large-scale external sources of information, such as distributed, networked repositories, as well as to the communications facilities of other members of the community [21, 14].

The assumption implicit in the design of many computer-based tools is that communication is a separate process from the user's main task. An analysis of computer network designers showed how the logical map, a representation of the design which shows network device interconnections, acted as the central artifact around which most communication occurred [23]. In response, XNetwork, an environment for supporting network design, provides designers with an integrated view of the design and the discussions about the design in conjunction with methods for importing electronic mail and bulletin board discussions into the design space. Figure 4 illustrates how the discussion and the design can be created and viewed together in XNetwork. The need to integrate discussion and artifact signals a more general need to integrate source information and information.



Figure 4. An integration of discussion and design in XNetwork


Just as community memories must be connected with the means of communication about their content, so must they be connected with the universal collections from which they arise. For example, prototypes of the Virtual Notebook System (VNS) [26, 11] used generic hypermedia to overcome the difficulty of integrating various sources of biomedical research information. The VNS was intended as an electronic analog to a researcher's notebook that could also act as a shared repository of information gathered from early digital libraries and other on-line sources. Such external information resources included the National Library of Medicine's Medline database containing bibliographic and abstract information on articles from medical journals. Users could connect to the Medline database through a graphical interface and could easily "paste" interesting information into their hypertext for later use.

Experience with these early prototypes of the VNS shows the difficulty of providing the needed connections to a variety of information sources and media. In addition to the Medline connection, researchers' required that the VNS include interfaces to organizational information resources, i.e. hospital and departmental information systems, as well as to their research information resources, such as genome and experimental data databases. As these examples show, the specific information resources used by a particular community can differ greatly in scope. The experimental databases were used by only one research group; the genome database was shared by a number of groups; and the hospital and departmental information systems were used by most of the staff within the institutions. Furthermore, as time goes on, information needs change, and the kinds of external resources that are available grow.

Thus, if we examine community memory from a contributor's viewpoint, we find that it is necessary to provide support for emergent structure, for maintaining both content and structure over the lifecycle of the community memory, and for easily extending the reach of the community memory to include new external sources of content and artifacts for discussion. But supporting contributors is only part of the picture; once the materials have coalesced into a usable superstructure to universal information resources, how do other members of the community recover materials from (and through) the community memory? In the next section, we will assume the perspective of the person who is using this facility.

2.2 Obtaining useful information from community memory: a user's perspective

At first it might seem that obtaining information from community memory is a classic information retrieval problem and thus amenable to treatment by many existing information retrieval techniques. There are, however, decisive differences which existing retrieval techniques do not address. One such difference is the nature of the information needs--the gaps in knowledge--of community members; another is the role of community in mediating retrieval. Addressing these requires new approaches.

2.2.1 "You don't know what you don't know": mechanisms for active memory.

Community members are often unaware that information they need is in community memory. In fact, they are often unaware that they need information of any kind. The principle is you don't know what you don't know. Since knowledge about knowledge is called meta-knowledge, we might call this the principle of meta-ignorance. Because of it users do not know when to pose queries to the memory, much less know how to formulate such queries.

Active memory. Conventional information systems--i.e., those based on information retrieval principles --do not provide information unless queries are posed to them. Because they can only react to deliberate user requests, such systems do not enable community memory to fulfill its potential in informing the tasks of community members. Instead, what is needed is memory that actively suggests information to users on the basis of some sort of understanding of their information needs. To do this, memory must have active agents that "look over the shoulders" of community members as they work and spot potential needs for information, then alert users to the existence of information of potential use for their current tasks.

Such active memory systems will not rely solely or even primarily on content-based retrieval as do systems that use information retrieval techniques. They will instead index information by the tasks for which they are useful. We call this task-based indexing. This is not to say that content-based retrieval that responds to explicit queries will not play in important part in obtaining information from community memory. It will, but its role will be to supplement task-based retrieval.

The JANUS system supporting design uses the relationships between domain-oriented construction kits and a domain-oriented issue base to integrate argumentative information into the task of constructing solution forms [7]. JANUS employs knowledge-based critics that "look over the designer's shoulder" and critique partially constructed solutions, pointing out potential inadequacies and providing relevant rationale from a domain-oriented issue base.

Generalizing from JANUS's critics, XNetwork includes agents to support the recovery of relevant information from a community memory. Like JANUS's critics, XNetwork's agents volunteer information or take some action based on the user's current actions. XNetwork agents can be created by designers to act as proponents of certain information and opinions. As part of this creation, the designer can select among methods for informing potential recipients of the information; the agents can be more or less intrusive depending on the nature of the content. Agents thus act as surrogates for users, advertising the existence of important information. In this way the agents support communication among the members of a community.

2.2.2 Community as information agency.

Perhaps the central point about obtaining information within communities of practice is that informed people are frequently the best source of information [6]. This function of community as information agency--i.e., as mediator of retrieval--is in fact one of the primary reasons for its existence. Supporting this function is thus decisive for the creation of successful electronic communities of practice.

Community memory can serve two crucial functions in helping people to find information. First of all, it can serve as a cache for that information and evaluations of its worth, thus reducing the difficulty of search and increasing its effectiveness. Secondly, it can serve as a means for identifying community members who either know the information or can help in locating it. Community memory might, in fact, consist in large part of explicit records of the knowledge of the individuals in the community. This knowledge can be stored in a number of ways, perhaps the most basic of which is frequently asked questions (FAQs). In fact, an IBIS on recurring issues can be seen as nothing more than a souped-up FAQ collection.

As research on IBIS hypermedia has shown, the problem of retrieving issues is by no means merely a conventional information retrieval problem. Above all, it requires more than retrieval by content or bibliographic reference. Retrieval of relevant information in complex question-based discussions is decisively aided by associative indexing -- i.e., indexing by the relationships among questions [21]. For one thing, answering a query (question/issue) might be aided by the answers given to similar queries. The answer might also depend on the answers given to other queries. Such similarity and dependency relationships are also valuable information that can aid retrieval.

Most of the knowledge of community members is not and cannot be stored in community memory. Even so, a community memory can still be a decisive aid to retrieval of such knowledge if it can guide the question-asker to the community member who has the knowledge. There are at least two ways in which community memory can be of help in this situation. One is by storing the questions that its members want answered, so that other members can become aware of these information needs. The other is by storing information about the types of knowledge possessed by its various members -- i.e., who knows what types of things. Ackerman's Answer Garden system takes this approach [1]. Community members may themselves be the best guide for finding other knowledgeable community members.

2.3 Comprehension of information from community memory: an interpreter's perspective

Once community members obtain information from memory, they attempt to use it for specific tasks. From the perspective of such an information user -- an interpreter of the structure and contents of the community memory -- there are two fundamental challenges. The first is to comprehend the information; the second is to apply it to the task at hand. For the former, a crucial issue is whether the representation of the information that seemed appropriate to the contributor is also appropriate for the user's current task. For the latter, the most crucial issue is whether the user will be able to reformulate and generalize the materials to apply them to the current situation.

2.3.1 Achieving a shared understanding: enabling metacommunication.

Community memory critically rests on idea that any one community member's contribution to such a shared resource is intelligible to other members of the community. But how do we ensure the intelligibility of material that results from a task that is not necessarily accessible in time (community memory is usually an asynchronous form of communication) or place (as we have seen, electronic communities are distributed groups)? Our past efforts have focused on two different tactics to make shared spaces mutually intelligible: meta-discussions within a space [13,15] to discuss the materials it contains, and shared representations that structure and organize the materials [19, 16, 27]. Yet the problem becomes much harder to solve as the community memory grows in size; rationale for the content and structure of the shared resource becomes opaque and inaccessible over time.

Realistically, some portion of emerging structure (and structure is continually emerging) will always be implicit. In systems to support collaborative intellectual work like NoteCards [31] and VIKI, the strategy to achieve mutual intelligibility has been to encourage contributors to explicitly record discussions about the work.

NoteCards is a hypertext-based information-organizing tool originally intended for individual use, but once a user community emerged, it became apparent that many tasks people were performing using the tool -- writing papers, managing projects, collecting and analyzing information -- were in fact group activities. As a result of this observation, NoteCards developers added facilities to support collaborative work [13]. Three of the more important facilities were: History Cards, tailored event-centered record keeping that could be annotated by collaborators; Guided Tours, a technique that allowed a presentation structure to be overlaid on a hypertext network; and TableTops, a means of contextualizing work by allowing a number of cards to be grouped as a visual composite [33].

Each of these mechanisms involved a semi-automatic way of recording changes or state (for example, TableTops recorded which cards were together on the screen, including scrolling). What we learned is that these recordings of paths, process, or state need to be supplemented by human annotation [15]. This need for human annotation (or communication about a group information resource) has been confirmed by our experiences with subsequent systems. VIKI provided no explicit mechanisms for recording change history, so collaborators developed conventions (electronic post-its) for communicating their changes to each other.

There is some perception that support for strong typing of materials will naturally bring about coherence; we have not found this to be the case. Aquanet relied on domain schemas to make contributions self-organizing and self-documenting, thereby rendering them intelligible to other group members; if one contributor creates, for example, a claim as part of an argument, the contribution's type (along with the role it plays in a community-defined structure) would allow other group members to interpret it. This strategy is based on two important assumptions: (1) people understand the meaning of the meta-schematic description and use it in a uniform way and (2) people fully use the schematic structures, and leave little implicit. In practice, neither of these assumptions has been found to hold. Collaborators still found themselves discussing the abstractions and how they ought to be applied. They also left a great deal implicit (including why a particular element should occupy a specified position in the shared space), thereby introducing a great deal of ambiguity and inconsistency.

Structure recognition and incremental formalization techniques may be useful in finding implicit structure and making it available for discussion within a community of practice; the implicit structure does not need to be declared, only located. Specific support for conversations about recognizable implicit structures may help members of a community keep their own contributions to the shared resource intelligible.

2.3.2 Situatedness and task specificity: representational fluidity.

The function of community memory is to inform the various information-intensive tasks that community members undertake. Since the information from memory can be represented in many alternative ways, we must ask which representation is appropriate. The most basic answer is that it depends on the nature of the tasks that communities of practice undertake.

The nature of community memory as a shared resource suggests that a given portion of the information might serve many different tasks. Thus, if we take use of community memory as reuse of information, we must consider two different types of reuse: reuse of the information itself (through generalization and reapplication) and reuse of the abstractions that structure this information. We look first at techniques for generalizing the materials themselves.

Generalization is a process in which details are removed and the resulting information is, in part, abstracted from its original context so that it may be applied to other situations. Generalizations are created with an expectation of future use. Different generalizations will be appropriate for different future situations. For example, in our experience with network design, the same design can be used as an example in situations with similar budgetary considerations and in situations using similar technology [8]. Providing fluid representation, where information can evolve in both structure and use, can facilitate such generalization. For example, XNetwork allowed designers to continually add and remove structure from the representation of the design and to make copies of the design available as more general examples within the community memory.

Since contributors cannot completely predict the situation of their audience, it is difficult to know how much background to provide to make their interpretations and knowledge useful at a later date [1]. There are at least two possible ways of addressing this problem. One is to record the context in which the contributions to memory were created. This is, however, limited because the contributors also cannot know which information about context is likely to be useful; and much of this the information about context is likely to stay in the form of tacit background knowledge. The second way of the addressing the problem is to provide links from contributions to the contributors, so that users of information might communicate with contributors to elicit more information about this background (on a demand-driven basis).

We now turn our attention to the abstractions used to structure the materials -- the meta-schematic descriptions of domains of interest. One of the original motivations for providing this kind of abstraction is the ability to reapply it to interpret related materials. We found this kind of reuse may be difficult to support with tools that do not acknowledge the fluidity of abstraction, since the structures people define are based on an idealization of the task and of the materials and may not fit well with the contingencies of the actual situation [30, 28].

For example, in our experiences performing a long term analysis task that involved assessing machine translation systems (see [17]), we found that the abstract types that highlighted certain technical aspects of the systems (like the approach they took to translation of natural language) were not entirely appropriate for a seemingly similar task of identifying candidate Spanish-English translation software for purchase. The new task required that aspects like cost and hardware platform be made perspicuous, while the old task did not call for explicit structural representation of these characteristics. In general, fixed representations of domain structure tend to cause material that doesn't quite fit into the abstractions to get lost, to drop from sight. This problem with the application of abstractions would surely be amplified as a community memory grew and encompassed more materials and more related tasks.

We addressed this problem in our later work by assuming that representations are fluid, lightweight, and locally-defined for the task. The appropriateness of different representation schemes for different tasks suggests that the "raw" information in community memory needs to be separable from the manner in which it is represented (as, for example, a view of the underlying materials rather than a property of the materials themselves). Representational variability will allow a given piece or collection of information to be generalized and reapplied, and the abstractions that structure it to be modified and reused.

3. Conclusions

Large scale information resources, digital libraries and other significant and authoritative on-line repositories, are under development, ready to act as testbeds for communities of practice. Much of the research focus thusfar has been on the technological basis for an infrastructure to provide storage and access to this wealth of materials. But we also must attend to the superstructure -- community memory and similar forms of use-directed indexing, annotation, and informal augmentation -- of these resources to make them serve the needs of people engaged in information-intensive intellectual work.

By taking different perspectives on this kind of superstructure, the perspective of a contributor, a user, and an interpreter of information, we can begin to realize the breadth of issues entailed by supporting communities who use large-scale information resources in their day-to-day activities. Just by looking at the World-Wide Web [3], which is itself a blend of repository and interpretive superstructure, we can see that these issues are close at hand. Contributors add materials with little consistent representation of structure; they often have difficulty maintaining their contributions, and ensuring the overall hygiene of the Web; and they put effort into constructing new bridges to other on-line resources. Users are, as predicted, unaware of materials and other members of the community that can help them do their work; they must explore, probe, and become truly engaged in the Web community's discourse before they can find their way around the extensive resources it offers. Finally, the reader's ability to comprehend and interpret Web materials relies crucially on the contributor's adherence to emerging on-line document genres and structural conventions; knowledge of these genres and conventions on the part of both readers and authors is the surest way of being able to interpret the hypertext.

From our collective experiences, we have identified different techniques that will support communities of practice in their use of these emerging large-scale information resources. First, taking a contributor's perspective, we pose structure recognition, incremental formalization, and representational flexibility to address the problems of evolving, and many times, implicit structure. Furthermore, we suggest that developers include superstructural support for the process of seeding, evolutionary growth, and reseeding to help address maintenance issues. Because distributed communities use distributed, heterogeneous resources, any community memory must provide ready connectivity and an open architecture.

As we have shown, information retrieval will not be sufficient to meet the needs of communities using large-scale information resources. A superstructure may also be used to support different methods for recovering information, like techniques for active recovery of material from both the community memory and the underlying information bases and explicit support for community-mediated location. In effect, aspects of librarianship may be distributed among members of the community.

Finally, problems of comprehension, mutual intelligibility, and reuse of materials can be addressed by supporting human-annotated records of process, and by expecting fluid, highly-situated representational forms.

By applying this collection of techniques, the emerging wealth of large-scale information resources can be put to work by groups and communities, and truly make a difference in the way we conduct our day-to-day lives.

References

1. Ackerman, M.S. Augmenting the Organizational Memory: A Field Study of Answer Garden. In Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, Chapel Hill, North Carolina (October 22-26, 1994), pp. 243-252.

2. Berlin, L.; Jeffries, R.; O'Day, V.L.; Paepcke, A.; Wharton, C. Where Did You Put It? Issues in the Design and Use of a Group Memory. Proceedings of InterCHI '93, Amsterdam, The Netherlands (April 24-29, 1993), pp. 23-30.

3. Berners-Lee, T.; Cailliau, R.; Luotonen, A.; Nielsen, H.F.; Secret, A. The World-Wide Web. Communications of the ACM, 37, 8 (August 1994), pp. 76-82.

4. Conklin, E.J., and Yakemovic, K.C. A Process-Oriented Approach to Design Rationale. Human Computer Interaction 6, 3-4 (1991), pp. 357-391.

5. Engelbart, D. Collaboration support provisions in AUGMENT. Proceedings of the AFIPS Office Automation Conference, Los Angeles (February 1984), pp. 51-58.

6. Ehrlich, K., and Cash, D. Turning Information into Knowledge: Information Finding as a Collaborative Activity. In Proceedings of Digital Libraries '94, College Station, Texas (June 19-21, 1994), pp. 119-125.

7. Fischer, G.; Lemke, A.; McCall, R.; and Morch, A. Making Argumentation Serve Design. Human Computer Interaction 6, 3-4 (1991), pp. 393-419.

8. Fischer, G.; McCall, R.; Ostwald, J.; Reeves, B.; and Shipman, F. Seeding, Evolutionary Growth, and Reseeding: Supporting the Incremental Development of Design Environments. Proceedings of CHI `94, Boston, Mass (Apr. 24-28, 1994), pp. 292-298.

9. Fox, E.A. Source Book on Digital Libraries. Version 1.0, Prepared for and Sponsored by the National Science Foundation. Blacksburg, VA: Polytechnic Institute and State University (December 6, 1993).

10. Goldberg, D.; Nichols, D.; Oki, B.M.; and Terry, D. Using Collaborative Filtering to Weave an Information Tapestry. Communications of the ACM, 35, 12 (December 1992), pp. 61-70.

11. Gorry, G.A.; Burger, A.; Chaney, J.; Long, K.; and Tausk, C. Computer Support for Biomedical Research Groups. In Proceedings of the Conference on Computer-Supported Cooperative Work, Portland, Oregon (Sept. 26-28, 1988), pp. 39-51.

12. Grudin, J. Why CSCW Applications Fail: Problems in the Design and Evaluation of Organizational Interfaces. In Proceedings of the Conference on Computer-Supported Cooperative Work, Portland, Oregon (Sept. 26-28, 1988), pp. 85-93.

13. Irish, P.M., and Trigg, R.H. Supporting Collaboration in Hypermedia: Issues and Experiences. Journal of the American Society for Information Science, (March, 1989).

14. Levy, D.M., and Marshall, C.C. Going Digital: A Look at Assumptions Underlying Digital Libraries. to appear in Communications of the ACM.

15. Marshall C.C., and Irish P.M. Guided Tours and On-Line Presentations: How Authors Make Existing Hypertext Intelligible for Readers. Hypertext `89 Proceedings, Pittsburgh, Penn. (Nov. 5-8, 1989), pp. 15-26.

16. Marshall, C.C.; Halasz, F.G.; Rogers, R.A.; and Janssen, W. Aquanet: a Hypertext Tool to Hold Your Knowledge in Place. In Proceedings of Hypertext `91 San Antonio, Texas (Dec. 15-18, 1991), pp. 261-275.

17. Marshall, C.C.; Shipman, F.M.; and Coombs, J.H. VIKI: Spatial Hypertext Supporting Emergent Structure. In Proceedings of the European Conference on Hypermedia Technologies (ECHT '94), Edinburgh, Scotland (Sept. 18-23, 1994), 13-23.

18. Marshall, C.C., and Rogers, R.A. Two Years before the Mist: Experiences with Aquanet. In Proceedings of European Conference on Hypertext (ECHT `92), Milan, Italy, (Dec. 1992), pp. 53-62.

19. McCall, R. On the Structure and Use of Issue Systems in Design. Doctoral Dissertation, University of California, Berkeley, University Microfilms, 1979.

20. McCall, R.; Schaab, B.; and Schuler, W. An Information Station for the Problem Solver: System Concepts. In Applications of Mini- and Microcomputers in Information, Documentation and Libraries, C. Keren, L. Perlmutter, Eds. New York: Elsevier, 1983.

21. McCall, R.; Bennett, P.; d'Oronzio, P.; Ostwald, J.; Shipman, F.; and Wallace, N. PHIDIAS: Integrating CAD Graphics into Dynamic Hypertext. In Proceedings of the European Conference on Hypertext (ECHT'90), Paris, France (Nov., 1990), pp. 152-165.

22. McKnight, C.; Meadows, J.; Pullinger, D.; and Rowland, F. ELVYN -- Publisher and Library Working Towards the Electronic Distribution and Use of Journals. In Proceedings of Digital Libraries '94, College Station, Texas (June 19-21, 1994), pp. 6-11.

23. Reeves, B.N., and Shipman, F.M. Supporting Communication between Designers with Artifact-Centered Evolving Information Spaces. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW `92), Toronto, Canada (Oct. 31-Nov. 4, 1992), pp. 394-401.

24. Schatz, B.R. Building an Electronic Community System. Journal of Management Information Systems. 8, 3 (Winter 1991-92), pp. 87-107.

25. Schuler, D. Community Networks: Building a New Participatory Medium. Communications of the ACM, 37, 1 (January 1994), pp. 39-51.

26. Shipman, F.M.; Chaney, R.J.; and Gorry, G.A. Distributed Hypertext for Collaborative Research: The Virtual Notebook System. In Proceedings of Hypertext `89, Pittsburgh, Penn. (Nov. 5-8, 1989), pp. 129-135.

27. Shipman, F.M. Supporting Knowledge-Base Evolution with Incremental Formalization. Technical Report CU-CS-658-93, Department of Computer Science, University of Colorado, Boulder (1993).

28. Shipman, F.M., and Marshall, C.C. Formality Considered Harmful: Experiences, Emerging Themes, and Directions. Xerox PARC Technical Report ISTL-CSA-94-08-02 (1994).

29. Shipman, F.M., and McCall, R. Supporting Knowledge-Base Evolution with Incremental Formalization. In Proceedings of CHI'94, Boston, Mass. (Apr. 24-28, 1994), pp. 285-291.

30. Suchman, L.A. Plans and Situated Actions: The Problem of Human-Machine Communication. Cambridge, UK: Cambridge University Press, 1987.

31. Suchman, L.; Trigg, R.; and Halasz, F. Supporting Collaboration in NoteCards. In D. Marca and G. Bock (Eds.) Groupware: Software for Computer-Supported Cooperative Work, Los Alamitos, CA: IEEE Computer Society Press, 1992, pp. 394-403.

32. Terveen, L.; Selfridge, P.G.; and Long, M. D. From `Folklore' to `Living Design Memory'. In Proceedings of InterCHI `93, Amsterdam, The Netherlands, (24-29 April 1993), pp. 15-22.

33. Trigg, R. Guided Tours and Tabletops: Tools for Communicating in a Hypertext Environment. ACM Transactions on Office Information Systems, 6, 4 (October, 1988), pp. 398-414.