Library     -     of technical articles along with code samples supported by Andrew Polar
Huffman Encoding





Huffman encoding is replacement of every symbol in a message by the binary fragment, which bit length is approximately equal to negative logarithm of the probability of occurrence -log2(p) with precision of rounding to the integer. The concept was known before David Huffman introduced his particular technique. The novelty of Huffman encoding concerned the way of construction of these binary fragments that was actually the best possible way in order to achieve shortest encoded message. When alphabet (number of different symbols in a message) is large and symbols have near normal distribution the bit lengths of encoded symbols are close to -log2(p) (rounded up or down) and total length of encoded message is close to theoretically possible limit. The arithmetic encoding developed in 80th practically reach this theoretical limit but has no advantage over Huffman encoding for the case of large alphabets, because Huffman encoding in this case is just a fraction of percent off this limit.

This Huffman encoding is developed mostly for verification of compression limit and not optimized for high performance.

David Huffman
Photo: Don Harris