CPSC 489/IR/Spring 2002/Leggett/Programming Lab 1/Due February 7

Canonical Huffman Coding

This lab consists of two parts:

   Part I. Generating a canonical huffman code.

       Input:     A file containing a set of words and their frequencies as extracted from a document collection.

       Output:  A canonical huffman code based on the input (layout as in Table 2.2, p. 35)
                     Three vectors:
                         Symbol - sorted in ascending sequence within codeword length
                         Length - length in bits of the canonical huffman code for the symbol
                         Code - the canonical huffman code for the symbol

   Part II. Encoding with a canonical huffman code.

       Input:     A text file to be compressed.
                     The necessary data structures generated during Part I.
                     Note: This does not include the complete code table.

       Output:  A two dimensional display of the running text with the canonical huffman code aligned with and just below the text.
                    For example:
                          the   cat        scratched   and
                          10110101101101101110111

Notes:

   1. Input for this lab can be found at:
             /user/leggett/ir/chc.part1.input
             /user/leggett/ir/chc.part2.input

   2. When you have completed the lab, you should turn in your output in three files:
             /pub/homework/489-501/chc.part1.output
             /pub/homework/489-501/chc.part2.output
             /pub/homework/489-501/chc.code.listing

   3. After placing the three files in the homework directory, send me an email which includes your full name and userid. I will check results in all named files, review your code, and return the lab grade sometime after receiving your email.