Exclusion statistics for more accurate estimation.
Many source code optimizations, primarily at a fairly low level of detail.
This directory contains the distribution for a package to do compression via arithmetic coding in Java. A very brief description of arithmetic coding with lots of pointers to other material can be found in:
The arithmetic coding package contains a generic arithemtic coder and decoder, along with byte stream models that are subclasses of Java's I/O streams. Example statistical models include a uniform distribution, simple unigram model, and a parametric prediction by partial matching (PPM) model. Other models can be built in the framework in the same way as the examples. A prebuilt set of javadoc is available online:
Download the source to target directory,
cd there, unjar the source, run ant test.
The distribution comes in three parts:
The precompiled class files may be used directly from jar by putting
the filename of the jar in the CLASSPATH, either in the
environment or in the argument to the JVM.
I attempted to structure the package according to the recommendations in Sun's Requirements for Writing Java API Specifications and in the book Effective Java by Joshua Block.
The documentation may be unpacked into a directory by executing
the following sequence of commands, where $TARGET_DIR is the
target directory into which the documentation is to be unpacked.
mv colloquial_arithcode_doc-1_1.jar $TARGET_DIR
cd $TARGET_DIR
jar xvf colloquial_arithcode_doc-1_1.jar
The source may be unpacked in exactly the same way as the documentation:
mv colloquial_arithcode_src-1_1.jar $TARGET_DIR
cd $TARGET_DIR
jar xvf colloquial_arithcode_src-1_1.jar
The source has been tested with the 1.3 and 1.4 Java 2 SDK compilers. The code may be compiled directly. It is packaged in the appropriate directory structure. But I prefer to use Apache Ant; it's like makefiles, only in XML and comprehensible (I know that sounds like an oxymoron several times over.). Ant can be downloaded as part of the Java Web Services Developer Pack. I followed the directory structure suggested by the useful but strangely named article Using Ant in Anger.
After unpacking the jar file and installing ant, the package and
documentation may be built from the command line and tests may be run.
For ant builds, there is a file conventionally named build.xml which
contains the build instructions. Here is the command to test that
everything's working, where $TARGET_DIR is where the
source was unpacked.:
cd $TARGET_DIR
ant test
ant -projecthelp to examine the other targets, or
look at the source.
I ran everything first through
java -Xprof,
which is just the Sun Java runtime with profiling for Windows. It actually
caught all the glaring inefficiencies, but to double-check, I ran it through
Rational Quantify for Windows.
Quantify is great; I use it at work. Get it if you can afford it.
It's worth it if you need the speed. The implementations used here don't have
any glaring inefficiencies; the algorithms on the other hand ... Any suggestions
for improvements or if you want to point out a glaring efficiency, I'd be glad to
hear about it.
For best performance, run java with the -server option. You can find a lot of information in the
Java Hotspot White Paper,
which contains a description of the
Java HotSpot Server Compiler.
The server option provides a lot of optimizations
compiler provides extensive in-lining of one-line functions and other
code unfolding and folding based on runtime analysis of hotspots. The
compressors and decompressors run around 10-50% faster in this mode. For more discussion
of this compiler and tuning Java in general, see
Java Performance Tuning by Jack Shirazi.