Main Page   Class Hierarchy   Compound List   File List   Compound Members   File Members  

Ripper Class Reference

Rips the repository and interfaces with main(). Calls GraphBuilder to build the web link graph. More...

Collaboration diagram for Ripper:

Collaboration graph
[legend]
List of all members.

Public Methods

 Ripper ()
 ~Ripper ()
void SetupHandlers ()
void ParseCmdLineArgs (int argc, char **argv)
 sets flags for all the possible command line options. More...

void RipRepository (ReposReader *rr)
 reads each document and builds a WebNode from it through GraphBuilder. More...

void PrintStatistics (ostream &o)
 prints out some memory usage statistics. Useful for out of memory errors. More...

WebLinkGraphPublishWebGraph ()
GraphBuilderGetGraphBuilder ()

Public Attributes

vector< string > rep_files_
struct {
   int   stop_after
   int   string_memory
   int   jumptable_memory
   int   nodetable_memory
   int   leaftable_memory
   int   start_ID
   int   pvm_numtasks
   bool   pvm_is_master
   bool   repos_from_stdin
   bool   interactive
   bool   print_index
   bool   handler_cat
   bool   handler_caturl
   bool   handler_catdate
   bool   handler_graph_print
   bool   handler_catlinks
   bool   no_graphbuilder
flags_
char * rippername
char * tempdir
ofstream indexout

Private Attributes

vector< ParseHandler * > parsehandlers_
GraphBuildergb

Detailed Description

Rips the repository and interfaces with main(). Calls GraphBuilder to build the web link graph.

Definition at line 77 of file ripper.cc.


Constructor & Destructor Documentation

Ripper::Ripper  
 

Definition at line 123 of file ripper.cc.

References defaultrippername, defaulttempdir, flags_, gb, NULL, rippername, and tempdir.

Ripper::~Ripper  
 

Definition at line 154 of file ripper.cc.

References gb, and parsehandlers_.


Member Function Documentation

GraphBuilder* Ripper::GetGraphBuilder   [inline]
 

Definition at line 87 of file ripper.cc.

References gb.

Referenced by main().

void Ripper::ParseCmdLineArgs int    argc,
char **    argv
 

sets flags for all the possible command line options.

Definition at line 253 of file ripper.cc.

References flags_, num_docs_processed, rep_files_, RIPPER_NAMELEN, RIPPER_TMPDIRLEN, rippername, tempdir, and usage().

Referenced by main().

void Ripper::PrintStatistics ostream &    o
 

prints out some memory usage statistics. Useful for out of memory errors.

Definition at line 246 of file ripper.cc.

References gb, num_docs_processed, and GraphBuilder::StatisticsMem().

Referenced by main(), and OutOfMemory().

WebLinkGraph* Ripper::PublishWebGraph   [inline]
 

Definition at line 86 of file ripper.cc.

References gb, and GraphBuilder::UndockWebGraph().

Referenced by main().

void Ripper::RipRepository ReposReader   rr
 

reads each document and builds a WebNode from it through GraphBuilder.

Definition at line 213 of file ripper.cc.

References ReposReader::AtEnd(), flags_, gb, indexout, GraphBuilder::NodeGetAlias(), GraphBuilder::NodeGetAlias_(), GraphBuilder::NodeGetDate(), GraphBuilder::NodeGetID(), GraphBuilder::NodeGetURL(), GraphBuilder::NodeGetURL_(), GraphBuilder::NodeInitialize(), GraphBuilder::NodeInsertLinks(), GraphBuilder::NodeLaunch(), num_docs_processed, parsehandlers_, and ParseElt::Process_Document().

Referenced by main().

void Ripper::SetupHandlers  
 

Definition at line 162 of file ripper.cc.

References flags_, gb, indexout, MakeCatDateHandler(), MakeCatHandler(), MakeCatURLHandler(), MakeGraphHandler(), parsehandlers_, RIPPER_NAMELEN, rippername, tempdir, and usage().

Referenced by main().


Member Data Documentation

struct { ... } Ripper::flags_
 

Referenced by main(), ParseCmdLineArgs(), Ripper(), RipRepository(), and SetupHandlers().

GraphBuilder* Ripper::gb [private]
 

Definition at line 120 of file ripper.cc.

Referenced by GetGraphBuilder(), PrintStatistics(), PublishWebGraph(), Ripper(), RipRepository(), SetupHandlers(), and ~Ripper().

bool Ripper::handler_cat
 

simple handler to "cat" repository.

Definition at line 104 of file ripper.cc.

bool Ripper::handler_catdate
 

show date of the document.

Definition at line 106 of file ripper.cc.

bool Ripper::handler_catlinks
 

display anchor links as they are being processed.

Definition at line 108 of file ripper.cc.

bool Ripper::handler_caturl
 

even simpler handler to "cat" just urls.

Definition at line 105 of file ripper.cc.

bool Ripper::handler_graph_print
 

prints the web graph.

Definition at line 107 of file ripper.cc.

ofstream Ripper::indexout
 

Definition at line 114 of file ripper.cc.

Referenced by main(), RipRepository(), and SetupHandlers().

bool Ripper::interactive
 

is ripper interactive or not?

Definition at line 102 of file ripper.cc.

int Ripper::jumptable_memory
 

memory to reserve for jumptable.

Definition at line 95 of file ripper.cc.

int Ripper::leaftable_memory
 

memory to reserve for leaftable.

Definition at line 97 of file ripper.cc.

bool Ripper::no_graphbuilder
 

don't build the web link graph.

Definition at line 109 of file ripper.cc.

int Ripper::nodetable_memory
 

memory to reserve for nodetable.

Definition at line 96 of file ripper.cc.

vector<ParseHandler*> Ripper::parsehandlers_ [private]
 

Definition at line 118 of file ripper.cc.

Referenced by RipRepository(), SetupHandlers(), and ~Ripper().

bool Ripper::print_index
 

prints the index of webnodes to a file.

Definition at line 103 of file ripper.cc.

bool Ripper::pvm_is_master
 

slave or master?

Definition at line 100 of file ripper.cc.

int Ripper::pvm_numtasks
 

number of pvm tasks in group.

Definition at line 99 of file ripper.cc.

vector<string> Ripper::rep_files_
 

Definition at line 89 of file ripper.cc.

Referenced by main(), and ParseCmdLineArgs().

bool Ripper::repos_from_stdin
 

Definition at line 101 of file ripper.cc.

char* Ripper::rippername
 

Definition at line 112 of file ripper.cc.

Referenced by main(), ParseCmdLineArgs(), Ripper(), and SetupHandlers().

int Ripper::start_ID
 

starting id number for WebNodes.

Definition at line 98 of file ripper.cc.

int Ripper::stop_after
 

if non-zero, stop processing after this many docs.

Definition at line 93 of file ripper.cc.

int Ripper::string_memory
 

memory to reserve for string table.

Definition at line 94 of file ripper.cc.

char* Ripper::tempdir
 

Definition at line 113 of file ripper.cc.

Referenced by main(), ParseCmdLineArgs(), Ripper(), and SetupHandlers().


Generated on Wed May 29 11:37:27 2002 for MarkovPR by doxygen1.2.15