There has been a boatload of discussion in the SEO/Webmaster community in regards to google sitemaps since the functionality was made publicly available (and to a lesser extent, the similar functionality provided by other search engines).
It seems to me that an incredibly large segment of the community has completely missed a major point, and in doing so, has caused an entirely new set of problems. First off, there has been a big emphasis on site size and the role of the number of pages indexed on a given site being a key to that site's weight in the SEO/SEM world. Sure, if you have a lot of content, and that content is quality content, then you should have a stronger site than others who have less quality. The problem is that too many people took this to the extreme - partially because Google didn't foresee the upcoming flood of web-spam; and partially because they didn't react to the problem quickly enough; but mostly because there are entirely too many people out there who are willing to make the entire world suffer so they can monetize traffic to the tune of less than a nickel per visitor.
It's an inherent problem with SEO. In order for little guys to compete, they simply have to emulate the tactics of larger players. With more and more niche communities being overrun with low quality sites with less than responsible webmasters, google has had to take actions such as penalizing sites, removing sites from their index, and even penalizing or removing entire networks of sites run by recognized-to-be-evil SEO's.
Having a large directory doesn't make a site better. Not if the directory isn't maintained or is simply duplicate content from another site. A blog with daily updated content isn't necessarily good if the content is all machine generated to target specific keywords. But that doesn't mean that those kinds of things don't have their place. Archiving mailing lists can set you up for huge duplicate content penalties - but not archiving mailing lists to avoid those penalties means that the community at large has fewer resources they can turn to. The same goes for publicly posting man pages, documentation segments, or even whole copies of documentation.
The way I see it, a webmaster should provide value to his community. If that means duplicating content that can be found elsewhere, so be it. A single site with focused information - even if it is duplicate data from several soures - is a valuable resource to members within it's niche.
This is really where sitemaps come into play in my mind, and the area where people inevitably end up going wrong. Sitemaps gives you the ability to rank your pages in importance. All of your unique content should be ranked significantly higher than your sourced content. You shouldn't rely on google to index your sourced content - if you want to search that content on-site then you really should implement your own site search capability. Rightfully - google should be directing traffic to the original author's site - at least up until that site is no longer available.
Too many people don't think at all when building their sitemaps. They simply rank things all at one level (either highest importance, or all medium importance). Additionally, people submit a lot of junk pages - such as tag clouds, category listing pages, or search results - which end up clouding the relevance of their more important pages.
When building a sitemap, you really do need to step back and think - what content is most important according to what traffic google might bring me? Build the map accordingly.
Google really would do the community at large a service if they produced better guidelines for ranking pages, but I don't see them doing that without an intelligent set of scenarios and ranking strategies being discussed within their small community.
Join The A Few Thoughts on Google Sitemaps Discussion
