Nutch-sa

From KallestadWiki

Jump to: navigation, search

Top URLs in your domain

bin/nutch readdb crawldb -topN NumberOfUrls outputdir

DB Stats

bin/nutch readdb crawldb -stats

List of Inbound and Outbound URLs

bin/nutch readlinkdb linkdb -dump outputdir

or

bin/nutch readlinkdb -url urltoquery

For the URL option, the url needs to be fully qualified, include a trailing slash for directories, and have the protocol lead in (http:// or https:// etc.)

Personal tools