email me

Software (GPL)

This page contains some of the software I have written. The software is licensed under the GNU GPL, which allows you to freely download, modify and redistribute the program and source code. Should you decide to distribute a modified version of these programs, then you must make the full source code, including your modifications, available to any recipients.

Each project has its own page, click the appropriate heading next to each description.

This is a small editor for the boxfile format used by the open source OCR engine tesseract.
This is a collection of Unix command line tools which resemble the GNU coreutils, but operate on XML files rather than plain text files. The initial ideas are explained in this essay.
This is a Bayesian text and email classifier, suitable for spam filtering or other classification tasks. It is small and fast. See this tutorial if you want to do spam filtering.
This program builds a web graph and calculates a generalization of PageRank using Markov chain technology. The project was created to operate on a dataset of just under one million real web pages offered to the public by Google.
subpixelgs is a hack which lets programs such as gv, Ghostview or GSView transparently display Postscript and PDF files using subpixel rendering. This was written before the technology became common on free Unix systems.
This is the Java source code for the Markov Chain Monte Carlo demonstration applets which you can find elsewhere on this site. The code is provided here AS IS, and is not currently being actively developed.