Compression via Arithmetic Coding in Java. Version 1.1

Apache/BSD Licensing

The arithmetic coding package is licensed under the standard Apache/BSD license.

Changes in Version 1.1

Exclusion statistics for more accurate estimation.

Many source code optimizations, primarily at a fairly low level of detail.


This directory contains the distribution for a package to do compression via arithmetic coding in Java. A very brief description of arithmetic coding with lots of pointers to other material can be found in:

The arithmetic coding package contains a generic arithemtic coder and decoder, along with byte stream models that are subclasses of Java's I/O streams. Example statistical models include a uniform distribution, simple unigram model, and a parametric prediction by partial matching (PPM) model. Other models can be built in the framework in the same way as the examples. A prebuilt set of javadoc is available online:

Quick Start for Java Pros

Download the source to target directory, cd there, unjar the source, run ant test.

Downloading com.colloquial.arithcode

The distribution comes in three parts:

Using com.colloquial.arithcode

The precompiled class files may be used directly from jar by putting the filename of the jar in the CLASSPATH, either in the environment or in the argument to the JVM.

I attempted to structure the package according to the recommendations in Sun's Requirements for Writing Java API Specifications and in the book Effective Java by Joshua Block.

Unpacking the documentation

The documentation may be unpacked into a directory by executing the following sequence of commands, where $TARGET_DIR is the target directory into which the documentation is to be unpacked.

The documentation was intended to follow the recommendations laid out in Sun's How to Write Doc Comments. I also used Sun's Doc Check Utilities for checking completeness of doc and linking errors. But they only work with the 1.2 and 1.3 Java 2 SDKs.

Building from the source

The source may be unpacked in exactly the same way as the documentation:

The source has been tested with the 1.3 and 1.4 Java 2 SDK compilers. The code may be compiled directly. It is packaged in the appropriate directory structure. But I prefer to use Apache Ant; it's like makefiles, only in XML and comprehensible (I know that sounds like an oxymoron several times over.). Ant can be downloaded as part of the Java Web Services Developer Pack. I followed the directory structure suggested by the useful but strangely named article Using Ant in Anger.

After unpacking the jar file and installing ant, the package and documentation may be built from the command line and tests may be run. For ant builds, there is a file conventionally named build.xml which contains the build instructions. Here is the command to test that everything's working, where $TARGET_DIR is where the source was unpacked.:

There are no environment variable dependencies in the build.xml. Use ant -projecthelp to examine the other targets, or look at the source.

Tools for Performance Tuning

I ran everything first through java -Xprof, which is just the Sun Java runtime with profiling for Windows. It actually caught all the glaring inefficiencies, but to double-check, I ran it through Rational Quantify for Windows. Quantify is great; I use it at work. Get it if you can afford it. It's worth it if you need the speed. The implementations used here don't have any glaring inefficiencies; the algorithms on the other hand ... Any suggestions for improvements or if you want to point out a glaring efficiency, I'd be glad to hear about it.

Performance Note

For best performance, run java with the -server option. You can find a lot of information in the Java Hotspot White Paper, which contains a description of the Java HotSpot Server Compiler. The server option provides a lot of optimizations compiler provides extensive in-lining of one-line functions and other code unfolding and folding based on runtime analysis of hotspots. The compressors and decompressors run around 10-50% faster in this mode. For more discussion of this compiler and tuning Java in general, see Java Performance Tuning by Jack Shirazi.

Compression Rates and Speed

Copyright 2002-2003. Bob Carpenter. Maintained by Bob Carpenter,