Editing nutch
From KallestadWiki
The following is a set of changes that I made to the nutch source files so that it fits my own sites needs
Changed Pages
- pages/en/search.xml
action="/search/en/search.jsp"
This is to ensure that the form gets posted to the proper location and doesn't rely on relative links to get there. (they are relative to the site root, but you get the point - not relative to the published page)
<img border="0" src="/images/poweredbynutch_01.gif"/>
Change the image to a location on my static webserver.
<a href="/search/en/help.html">help</a>
Again changing to avoid reliance on relative links.
- pages/en/help.xml
Minor language Edits
- pages/en/about.xml
Replaced with my own text
- locale/org/nutch/jsp/search_en.properties
Edit title Edit hits
- locale/org/nutch/jsp/cached_en.properties
edit title edit noContent
- locale/org/nutch/jsp/text_en.properties
edit title edit note edit noText
- locale/org/nutch/jsp/explain_en.properties
edit title
- locale/org/nutch/jsp/anchors_en.properties
edit title edit anchors
Copy the english versions back to their default counterparts
cp search_en.properties search.properties cp cached_en.properties cached.properties cp text_en.properties text.properties cp explain_en.properties explain.properties cp anchors_en.properties anchors.properties
- style/nutch-header.xsl
Change all links to be relative to site root. Change all images to be relative to site root.
- style/nutch-page.xsl
Edit Title Edit Shortcut Icons images relative to site root
- include/footer.html
Remove External Language Links
- include/en/header.xml
Remove nutch FAQ link
<item><a href="http://wiki.apache.org/nutch/FAQ">FAQ</a></item>
Change about.html link to /search/en/about.html
in the jsp directory:
- anchors.jsp
update base href to reflect my sites fully qualified url
<base href="<%= "http://www.mydomain.com/search/" + language %>/">
- explain.jsp
update base href to reflect my sites fully qualified url
- search.jsp
Change hitspersite to 100
int hitsPerSite = 100;
Update favicon references Remove RSS Alternate Link update form action to /search/search.jsp from ../search.jsp update help link to /search/en/help.html update cached.jsp link to /search/cached.jsp update explain link to /search/explain.jsp update anchors link to /search/anchors.jsp update more link to /search/search.jsp update second form action to /search/search.jsp Remove RSS link - entire table actually Remove FAQ link update base href to reflect my sites fully qualified url
- text.jsp
Build Nutch
After you've made your appropriate changes, re-build nutch:
ant war jar
