Instructor: Dr. John Leggett Teaching Assistant: Teonjoo Ong
Office: HRBB 404 Office: HRBB 408G
Office Hours: MWF 2:40-3:30 Office Hours: TTh 1:00-2:00
Phone: 845-0298 Phone: 845-4924
Email: Email:

Managing Gigabytes - Compressing and Indexing Documents and Images, second edition, by Ian H. Witten, Alistair Moffat, and Timothy C. Bell, Morgan Kaufmann Publishers, 1999.

Technical reports from leading research labs and papers from journals and conference proceedings.

  Exam 1                   20%
  Exam 2                   20%
  Exam 3                   20%
  Team Project          20%
  Programming Labs  20%

  Final course grades will be based on an average: A:100-90; B:89-80; C:79-70; D:69-60; F:59-0.

Exams are comprehensive and cover all discussion in class, all readings and all assignments to date.

Team Project
Teams will be responsible for creating new or existing Internet information servers of various types. For existing servers, the team will obtain the server, build it, customize it, provide clients for its use, and write a simple guide for using the server. Each team will present the new service to the class and the grade will reflect the quality of the oral and written presentations. The Team Project must be approved by the instructor.

This class requires a significant amount of programming. You will be responsible for implementing several of the algorithms discussed in the class.

Course Description
Information retrieval (IR) covers issues of representation, storage, and access to very large multimedia document collections. This course covers the fundamental data structures, algorithms, and access methods of current information storage and retrieval systems and relates the various techniques to the design and evaluation of complete retrieval systems delivered on the Internet and in digital libraries. Course content includes coverage of algorithms for indexing, compressing, and querying very large digital collections and tools and techniques for managing information services on the Internet.

Major Topics
Introduction to Information Storage and Retrieval Systems; Introduction to Data Structures and Algorithms Related to Information Retrieval; Text Compression; Indexing; Inverted Files and Signature Files; Compressing Inverted Files; Lexical Analysis; Stemming Algorithms; Thesaurus Construction; Querying; String Search Algorithms; Query Modification Techniques including Relevance Feedback; Boolean Operations; Hashing Algorithms and Ranking Algorithms; Extended Boolean Models; Clustering Algorithms; Index Construction and Compression; Image Types and Image Compression; Textual, Audio and Video Images; Digital Libraries