Digital Biodiversity
The Flora of Texas Project

submitted August, 1999 to the
Texas Higher Education Coordinating Board
Advanced Research Program

Project Summary:

The Internet offers new options for both development and presentation of complex data. Information previously provided from a single source within the static, limited context of printed documents can now be merged from multiple sources, dynamically updated, and expanded in many ways. The digital medium provides new opportunities to facilitate public access, enhance interpretation via the incorporation of graphic elements and photographic images, and also manage complex information for presentation to audiences at different levels of expertise. Research proposed here builds on initial efforts of botanical research centers in Texas to create and maintain a digital Flora of Texas. This will include refinement and expansion of plant specimen label data capture systems, deployment of these to new data gathering sites, development of data translation programs, refinement and elaboration of extant web-based data access systems, development and curation of distributed plant image libraries, and exploration of methods that can be employed to produce a unified database for the Texas flora that includes images, specimen data, and text (keys, descriptions). This collaboration involving six institutions will be centered on continued development and enhancement of an herbarium specimen browser system that is now available through the Flora of Texas Consortium website. Content of this system at the end of the project period will be 300,000 specimen records, images representing 2,000 Texas species, and key/description coverage of over 1,000 species. A primary objective will be to establish procedures and standards that will allow continued development of this public resource via multi-institutional cooperation and stewardship. An important component of this project will be the interdisciplinary training of graduate and undergraduate students in botany and computer science. The digital flora produced by this project will be a valuable resource for students at all levels, educators, scientists, government agencies, and all with an interest in the botanical heritage of Texas
 

Description of Research:

Research Objectives: Three-fourths of the 680 plant species listed as in danger of becoming extinct in the next ten years occur in Texas, California, Florida, Hawaii, and Puerto Rico. Texas is the only state of this group that lacks a comprehensive geographical knowledge of its plant resources and the most recently written flora for the (Correll & Johnston, 1970) is obsolete. Although vascular plant diversity represents the foundation of both natural and agricultural ecosystems, critical questions relating to the Texas flora from commercial enterprises, governmental agencies, scientific sources, schools, and the general public, are currently insufficiently answered by individual academic centers. A single, consolidated source of high quality data is needed and this need will increase as the population and economy of Texas grow.

Expanded public access to the Internet, and its use as a teaching resource at all educational levels, has established the World Wide Web (WWW) as a primary public information source. The most expedient and cost effective way to establish a new knowledge base for Texas plants is to place collections data and scientific expertise associated with Texas herbaria on the Internet. Developed by over 100 years of botanical research and increased on a daily basis by current work, these herbaria collectively contain over 1,000,000 Texas plant specimens that are permanent, archived documents of plants at the time of collection that include biological identification, specific date and locality of collection, and often the ecological setting. This body of material, which represents a significant, long-term investment of public funding, provides a continuously curated and fully documented perspective on the vascular flora of Texas. Research proposed here is focused on maximizing public access to these resources via the Internet.

The Flora of Texas Consortium (FTC) was organized in 1993 to make the vast resources housed in member herbaria available to the people of Texas. The consortium consists of the two largest educational institutions in the state, plus 19 public and private institutions (URL 1). Since no single institution is comprehensive, in terms of expertise and specimen data, for the entire Texas flora, a collaborative approach is essential. To our knowledge, Texas has never had a project of this scope nor with so many institutions engaged in a joint effort to provide data of interest to Texans as well as state and national organizations. The goals of the Flora of Texas Project are to 1) create an electronic database of information about the ca. 6,000 taxa of native and naturalized Texas plants; 2) make this database accessible via the Internet; 3) use the database to provide descriptions of species, regional synonyms, distribution maps, and access to images of Texas plants, and; 4) generate floras of all or parts of Texas, especially critical areas such as the Texas/Mexico border region, as required by local, state, and national users.

This proposal involves research needed for continued, cooperative development of an infrastructure of both data capture and data management systems that will allow enhanced, open public access to a single, complete, fully documented array of Texas plant biodiversity - a digital Texas flora. Initial development, with support from THECB-ARP and other sources, involved the production and distribution of prototype herbarium specimen data capture systems that can be installed on common personal computer platforms. Design has centered on the need for rapid specimen label data acquisition by non-professional personnel and data output that is consistent with the FTC data exchange format (URL 2). An MS-Windows input system, developed with THECB-ARP support, is available for public access (URL 3) and is in use at several FTC herbaria. While requested funding involves support for additional work to refine and enhance both types of data entry systems (Apple and PC), our primary focus will be further development of web-based systems, protocols, and standards that allow open, global access to these data. Data entry at seven FTC herbaria has generated digital data for 130, 232 specimens which can be accessed through a prototype ‘herbarium specimen browser' system (URL 4). This system allows the user to query the combined set of data, which has been merged and indexed, using a set of selectable listing and mapping options.

Specific objectives are: 1) refinement and expansion, via usage and testing, of current data capture systems to enhance data entry speed and increase label data content captured, 2) deployment of the specimen data capture systems and associated data visualization tools to other FTC herbaria (Angelo State, Southwest Texas State), 3) development of ‘translation' programs that will convert output from other specimen input programs (‘Biota' - URL 5, ‘Specify' - URL 6) to files suitable for FTC processing and convert FTC-standard data files to formats or standards now being established for regional or national systems, 4) continued refinement of the web-based access system to include extension of the current mapping system from county-level Texas maps to state/province-level North American maps and explore applications of geographic coordinates, 5) production of systems that allow user access to digital images representing Texas species and development of an infrastructure to allow FTC members to build plant image libraries that are accessible to users, 6) development of applications that will allow conversion of data based on different nomenclatural treatments to an integrated form for WWW display, either as a nomenclaturally standardized array or as different 'views', and 7) using draft files for the Manual of Vascular Plants of Brazos and Surrounding Counties now under development by M. Reed at Texas A&M as a working template, exploration of novel indexing techniques for expression of traditional floristic materials (keys and taxon descriptions) with embedded linkage to displays of specimen data and images. These files, which include keys and descriptions for over 1,000 vascular plant species, was used as a project by computer science graduate students in a course, Information Storage and Retrieval, during the Fall of 1998. Initial results from this experiment were very encouraging and the resulting prototype will be developed further.

As Texas moves to extend WWW access to all public school districts in the state, users of products of this research will increasingly be K-12 teachers seeking high quality, documented information on Texas plants. Proposed products, such as taxonomic listings, descriptions, keys, plant images, and distribution maps comprise digital 'raw materials' for educators and students at all levels. Undergraduate students with an interest in the WWW are able to explore this realm as part of the curriculum, as either special topics research projects (URL 7) or WWW-based seminars (URL 8). As indicated by a recent award-winning paper (URL 9), which describes a prototype of the specimen browser system, work at this developing interface also opens new and interesting research and training opportunities for graduate students.

Methodology: This area of research requires communication between scientists working in the life sciences and engineering, two areas that are normally well separated by academic administrative structures and scientific culture. Progress to date has been based on problem-oriented interactions among botanists and computer scientists through scheduled meetings of the FTC, discussion on an FTC e-mail listserver, and weekly meetings of the Texas A&M Bioinformatics Working Group, an interdisciplinary group which includes botanists and computer scientists. There are no precedents for this type of work in terms of magnitude (multi-herbaria data resources) or WWW biodiversity data presentation (novel indexing, county-level mapped visualization), although the CalFlora project (URL 10) has similar objectives. Consequently, solutions developed by this multi-institutional, interdisciplinary group are derived from interactive exploration, testing, and refinement of versioned prototype systems via continuous discussion among all involved. Funding requested here will allow this interaction, which has been productive, to continue and expand.

Work involving multiple institutions requires a basic premise of local autonomy, i.e., protocols, traditions, and working standards in place at participating Texas herbaria must not be impacted by consortium operations. Thus, FTC prototype systems are the product of a data exchange format that we will continue to develop during the project period. The current method of data exchange and expression, to be explored in more detail during the project period, involves transfer of standard data files from participating herbaria to a common repository. To update data available through the current web interface, a program processes these files to produce a series of indices. Operations of the web-based 'front end' involve URL queries to these indices to generate both HTML-formatted text and mapping output for the user. This approach allows data contributors to develop and curate local data using familiar systems and protocols. It also concentrates processing at the point of origin and thereby removes the cost of processing for the user. Thus, systems under development by the FTC, via interaction with computer scientists at Texas A&M's Center for the Study of Digital Libraries (CSDL), provide quick response to user queries and other features, such as real time mapped visualization of distributions and color-coded diversity maps, that reflect the simplicity and speed of the best indexing algorithms.

While refinement of established local data input and WWW output prototypes will be a primary objective, THECB-ARP support will also allow the group to pursue elements of a digital flora that extend beyond specimen data development and expression. A primary objective of the FTC is to create a digital replacement for Correll and Johnston's Manual of the Vascular Plants of Texas (1970). The digital medium has a potential to employ elements that are difficult or impossible to express in traditional, hardcopy floristic manuals and we would like to maximize this potential. Each of the over 6,000 vascular plant taxa found in Texas has two defining features that are of primary importance for the typical user, a range of distribution and a structural identity. While our current prototype includes a mapping module (either taxa or specimens) that will be developed further, identification of plants is routinely done using 'key' morphological features. Due to restrictions associated with hardcopy publication, graphic display of these features is either not present in most regional/state floras or restricted to line drawings. Advances in digital photography and WWW multimedia capability allow descriptive imagery to be presented in much richer graphic detail. Participating herbaria will pursue development of local vascular plant image libraries via digital conversion of extant slide collections now in place at each facility or by direct digital photography. Slides will be digitized using scanners or commercial conversion to Kodak CD format with batch processing to browser-compatible file formats. File names, plant names, and photographer comments will be maintained at library sites using either local systems (spreadsheet/database) or a new module of the FTC input systems allowing the user to browse photos available for a given taxon. A prototype system that employs program-generated access pages that link technical plant names to an index that carries links to over 8,800 vascular plant image files positioned at different server locations has been under development for several years (URL 11). The 'content' importance of linkage between plant photos and accurate taxonomic identifications is indicated by over 52, 000 queries to HTML pages associated with this site during the period 13 June to 12 July, 1999. In addition to work with plant images, we will also examine and test technologies and procedures that would allow inclusion of traditional floristic content, i.e., taxonomic keys and descriptions. Preliminary development, described above, of creating an indexed manual of keys and descriptions will be pursued in more detail. Of special interest will be methods that allow immediate updates to indices as changes (corrections, nomenclatural changes, additions) are made in base word processing files for both descriptions and keys, assignment of authorship for products of this nature produced by multiple contributors, linkage of text descriptions and key couplets with supporting images, and linkage of distribution statements with available maps. Our planned schedule involves: 1 Jan - 30 Jun 2000: Establish a working base (personnel, hardware, software installation) at all sites, deploy and test data management software and procedures at new input sites, design new extensions/additions for extant tools and begin implementation; 1 Jul -31 Dec 2000: Initiate a website for central access to Texas plant images and test modifications of the output system that links specimen data to images; 1 Jan - 30 Jun 2001: Continue modification/testing of input and web output systems as needed and both image and specimen data content development; Jul 1 - Dec 31, 2001: Complete all websites associated with the project (specimen data, images, keys/descriptions) and interlinking systems. Anticipated content at this time will be 300,000 specimen records, images representing 2,000 Texas species, and key/description coverage of ca. 1,000 taxa with full linkage via the web interface.

Research Personnel: Program code for further development of current systems and new applications will be generated by two computer science graduate students and an undergraduate assistant at the CSDL. This research, defined by interactions among project P.I.s, will be directed by Dr. Leggett.

Activities at each participating herbarium will be directed by Drs. Hatch (TAES), Amos (ASU), Simpson (PRC), Lemke (SWT), and Wilson (TAMU). Dr. Todzia (Research Associate-PRC) will work with CSDL programmers and a PRC graduate student to develop Apple-based systems and manage operations at the large PRC collection. Specimen processing, image acquisition and processing, and HTML production will be mediated by a single graduate student at each facility. Routine image processing, HTML coding, and specimen data capture will be accomplished by two student workers at SWT and TAES, and one at other herbaria.

Institutional Commitment: Work proposed here has been established as a research project by the CSDL, a unit based at the Texas Engineering Experiment Station and dedicated to digital libraries research and development of advanced Internet applications. This facility, directed by Dr. Leggett, is providing server support (Sparc1000), systems administration, programming assistance, and critical expertise. The University of Texas at Austin provides operating expenses for the Plant Resources Center and pays part of Dr. Todzia’s salary. This facility has leveraged $20,000 in contracts from the USDA Natural Resources Conservation Service for the initial input system development, creation of a local access node (URL 12), and contribution of over 30,000 records to the merged FTC data set. Southwest Texas State University provides funding for normal operating expenses of the SWT Herbarium, and a privately-endowed Herbarium Development Fund has been established to cover support for student collecting trips and hourly wages for undergraduate student workers. An Internet connection has been dedicated to the project, and SWT provides one-quarter release time to Dr. Lemke for research activities associated with the herbarium. The Tracy Herbarium, a component of the Texas Agricultural Experiment Station, has three networked Pentium workstations dedicated to specimen data input. The Biology Department Herbarium at Texas A&M University supports the coordinating project P.I., Dr. Wilson, with assistance from its full time staff Herbarium Botanist, Ms. Reed. This facility includes four Pentium workstations, each connected to departmental servers for file storage and communications. Ms. Reed's involvement with the Flora of Texas Project will be supported by the Biology Department and, if the project receives THECB-ARP funding, her current workstation will be upgraded using departmental funds. The Department of Biology Herbarium at Angelo State University has one Pentium workstation (with connection) dedicated to the herbarium. ASU is currently providing summer support for the P.I., Dr. Amos, an ASU computer science Professor, Dr. Motl, and a group of four graduate and undergraduate students for work with the herbarium data capture and retrieval system.

Student Involvement and Training Opportunities in Science and Engineering: Proposed research, centered on an interface between two distinct areas of academic activity, engineering and botany, opens new opportunities for faculty and students in both areas. Graduate students will receive training in bioinformatics, data management, and curatorial procedures. Graduate students in computer science and botany will share responsibility with the P.I.s for refinement and expansion of data capture systems, development of translation programs, and production of systems that allow users to access data and images. Undergraduate students will be involved with data entry systems testing by interacting with faculty and graduate students. They will also assist in specimen curation, image processing, and other operations that provide enhanced exposure to both new technologies and systematic botany.

Budget Justification: Salary support requested here will allow the P.I.s to work toward project goals with minimal impact on current teaching and research responsibilities. This will be accomplished by one month summer salary for most P.I.s and Dr. Todzia, and salary support for students at all collaborating sites. Two CSDL graduate students, working in areas of computer science immediately relevant to project objectives, will produce program code (PC database routines, server-based CGI, Java, etc.) that will refine current tools and generate new products. Botany graduate students will pursue proposal-related botanical projects relating to specimen data, plant images, or descriptive/identification output systems. Because equipment types and needs vary at each site, support requested here will be invested in either digital image acquisition or computer hardware at each facility according to local needs and priorities relative to project objectives. Funds requested for domestic travel will support project P.I. travel to FTC meetings (twice per year) and presentation of project related products at national meeting by the coordinating P.I. and graduate students during the second year. The ‘materials and supplies’ budget will provide for film and other expendable items and software upgrades as needed.

Bibliography:

Correll, D. S. and M. C. Johnston. 1970. Manual of the Vascular Plants of Texas. Texas Research Foundation, Renner, Texas (reprinted 1979).

URL 1: http://www.csdl.tamu.edu/FLORA/ftc/ftchome.htm (FTC homepage)
URL 2: http://www.csdl.tamu.edu/FLORA/ftc/ftcffld4.htm (FTC Data exchange format)
URL 3: http://www.csdl.tamu.edu/FLORA/input/inputsys.html. (Tracy Specimen Data Input System)
URL 4: http://www.csdl.tamu.edu/FLORA/ftc/ftphsb.htm (Herbarium Specimen Browser)
URL 5: http://viceroy.eeb.uconn.edu/biota (Biota Specimen Data Input System)
URL 6: http://www.usobi.org/specify/ (Specify Specimen Data Input System)
URL 7: http://www.csdl.tamu.edu/FLORA/biolherb/tamu485.htm (Example: Internet-based undergraduate research)
URL 8: http://www.csdl.tamu.edu/FLORA/Wilson/481/semfront99.htm (Example: Internet-based undergraduate seminars)
URL 9: http://www.csdl.tamu.edu/FLORA/papers/webnet97/ (Example: Graduate Student work)
URL 10: http://elib.cs.berkeley.edu/calflora/about.html (CalFlora Project)
URL 11: http://www.csdl.tamu.edu/FLORA/gallery.htm (Vascular Plant Image Gallery)
URL 12: http://129.116.69.198/ (PRC Flora of Texas database)

Additional Materials:

Demo screen displays and discussion of FTC projects:

            Endemics, Helianthus, Herbarium Specimen Browser

Example - Undergraduate Teaching:

            Seminar page, sample report page, sample report map

Additional information for collaborating institutions:  page 1 and page 2


Return to FTC Chronology or Information