email me

markovpr (GPL 2 or higher)

MarkovPR is a PageRank calculator using Markov chains. It reads a large number of web pages, builds an efficient in-memory representation of the web link graph, and runs a Markov chain particle system which converges to PageRank. Besides plain PageRank, it can compute various generalizations, including a version of PageRank which takes into account the age of a web page. It can also be used to obtain a "perfect sample" from PageRank. It is described in two accompanying papers:

For their first programming contest (2002), the search engine Google made available just under one million web pages from their index, totalling 5 Gb of data. This dataset was read in about 15 minutes by MarkovPR, on a 500Mhz Pentium III, taking just under 200Mb of RAM, (which is fairly impressive for low end hardware of the time).

While the source code here is Free, only a small number of sample web pages are bundled with the code. To duplicate the results, you will need to obtain Google's dataset (5 Gb) by contacting them yourself, as I am not allowed to redistribute it (besides, I don't have the bandwidth):

This repository of web page information is being provided to you by Google Inc. solely for academic and research purposes related to the Google programming contest. You may not modify, distribute, or make any commercial use of the repository.

markovpr-1.1.tar.bz2 (MD5)
MarkovPR is a collection of programs (written in C++) which build a web link graph and calculate various types of page ranking distributions by means of suitable Markov chains (screenshot). The data defining the web graph must be obtained separately as described inside the download. You can find technical descriptions of this software on my preprints page. Tested on Linux.