Collaboration diagram for Ripper:
Public Methods | |
Ripper () | |
~Ripper () | |
void | SetupHandlers () |
void | ParseCmdLineArgs (int argc, char **argv) |
sets flags for all the possible command line options. More... | |
void | RipRepository (ReposReader *rr) |
reads each document and builds a WebNode from it through GraphBuilder. More... | |
void | PrintStatistics (ostream &o) |
prints out some memory usage statistics. Useful for out of memory errors. More... | |
WebLinkGraph * | PublishWebGraph () |
GraphBuilder * | GetGraphBuilder () |
Public Attributes | |
vector< string > | rep_files_ |
struct { | |
int stop_after | |
int string_memory | |
int jumptable_memory | |
int nodetable_memory | |
int leaftable_memory | |
int start_ID | |
int pvm_numtasks | |
bool pvm_is_master | |
bool repos_from_stdin | |
bool interactive | |
bool print_index | |
bool handler_cat | |
bool handler_caturl | |
bool handler_catdate | |
bool handler_graph_print | |
bool handler_catlinks | |
bool no_graphbuilder | |
} | flags_ |
char * | rippername |
char * | tempdir |
ofstream | indexout |
Private Attributes | |
vector< ParseHandler * > | parsehandlers_ |
GraphBuilder * | gb |
Definition at line 77 of file ripper.cc.
|
Definition at line 123 of file ripper.cc. References defaultrippername, defaulttempdir, flags_, gb, NULL, rippername, and tempdir. |
|
Definition at line 154 of file ripper.cc. References gb, and parsehandlers_. |
|
Definition at line 87 of file ripper.cc. References gb. Referenced by main(). |
|
sets flags for all the possible command line options.
Definition at line 253 of file ripper.cc. References flags_, num_docs_processed, rep_files_, RIPPER_NAMELEN, RIPPER_TMPDIRLEN, rippername, tempdir, and usage(). Referenced by main(). |
|
prints out some memory usage statistics. Useful for out of memory errors.
Definition at line 246 of file ripper.cc. References gb, num_docs_processed, and GraphBuilder::StatisticsMem(). Referenced by main(), and OutOfMemory(). |
|
Definition at line 86 of file ripper.cc. References gb, and GraphBuilder::UndockWebGraph(). Referenced by main(). |
|
reads each document and builds a WebNode from it through GraphBuilder.
Definition at line 213 of file ripper.cc. References ReposReader::AtEnd(), flags_, gb, indexout, GraphBuilder::NodeGetAlias(), GraphBuilder::NodeGetAlias_(), GraphBuilder::NodeGetDate(), GraphBuilder::NodeGetID(), GraphBuilder::NodeGetURL(), GraphBuilder::NodeGetURL_(), GraphBuilder::NodeInitialize(), GraphBuilder::NodeInsertLinks(), GraphBuilder::NodeLaunch(), num_docs_processed, parsehandlers_, and ParseElt::Process_Document(). Referenced by main(). |
|
Definition at line 162 of file ripper.cc. References flags_, gb, indexout, MakeCatDateHandler(), MakeCatHandler(), MakeCatURLHandler(), MakeGraphHandler(), parsehandlers_, RIPPER_NAMELEN, rippername, tempdir, and usage(). Referenced by main(). |
|
Referenced by main(), ParseCmdLineArgs(), Ripper(), RipRepository(), and SetupHandlers(). |
|
Definition at line 120 of file ripper.cc. Referenced by GetGraphBuilder(), PrintStatistics(), PublishWebGraph(), Ripper(), RipRepository(), SetupHandlers(), and ~Ripper(). |
|
simple handler to "cat" repository.
|
|
show date of the document.
|
|
display anchor links as they are being processed.
|
|
even simpler handler to "cat" just urls.
|
|
prints the web graph.
|
|
Definition at line 114 of file ripper.cc. Referenced by main(), RipRepository(), and SetupHandlers(). |
|
is ripper interactive or not?
|
|
memory to reserve for jumptable.
|
|
memory to reserve for leaftable.
|
|
don't build the web link graph.
|
|
memory to reserve for nodetable.
|
|
Definition at line 118 of file ripper.cc. Referenced by RipRepository(), SetupHandlers(), and ~Ripper(). |
|
prints the index of webnodes to a file.
|
|
slave or master?
|
|
number of pvm tasks in group.
|
|
Definition at line 89 of file ripper.cc. Referenced by main(), and ParseCmdLineArgs(). |
|
|
|
Definition at line 112 of file ripper.cc. Referenced by main(), ParseCmdLineArgs(), Ripper(), and SetupHandlers(). |
|
starting id number for WebNodes.
|
|
if non-zero, stop processing after this many docs.
|
|
memory to reserve for string table.
|
|
Definition at line 113 of file ripper.cc. Referenced by main(), ParseCmdLineArgs(), Ripper(), and SetupHandlers(). |