CPSC 670/IR/Fall 2003/Leggett/Programming Lab 1/Due September 24

Canonical Huffman Coding

This lab consists of two parts:

   Part I. Generating a canonical huffman code.

       Input:     A file containing a set of words and their frequencies as extracted from a document collection.

       Output:  A canonical huffman code based on the input (layout as in Table 2.2, p. 35)
                     Three vectors:
                         Symbol - sorted in ascending sequence within codeword length
                         Length - length in bits of the canonical huffman code for the symbol
                         Code - the canonical huffman code for the symbol

   Part II. Encoding with a canonical huffman code.

       Input:     A text file to be compressed.
                     The necessary data structures generated during Part I.
                     Note: This does not include the complete code table.

       Output:  A two dimensional display of the running text with the canonical huffman code aligned with and just below the text.
                    For example:
                          the   cat        scratched   and
                          10110101101101101110111

Notes:

   1. Input for this lab can be found at:
             /user/leggett/ir/chc.part1.input
             /user/leggett/ir/chc.part2.input

   2. You should design a web page for the lab that contains links to: 1) the output from part I, 2) your source code for part I, 3) the output from part II, and 4) your source code for part II.

   3. When you have completed the lab, send an email which includes your full name, userid, and complete URL for the web page mentioned in #2 above. The lab grade will be emailed sometime after receiving your email.