Well, not really. But Google Sitemaps does employ XML technology in order to provide its program members the opportunity to have their site crawled after they make updates or alterations.
Before we get ahead of ourselves and in case you haven’t heard, yesterday, Google launched Sitemaps, a “collaborative crawling” service designed to keep Google informed of modifications to your web site so their search index can reflect these changes… or as Rusty calls it, a free pay-for-inclusion program.
Sitemaps works by taking advantage of XML and RSS capabilities. By placing XML code on the web server, you inform Google of when changes occur, and they respond by crawling the updated pages and making the necessary updates to the search index. Over at the Google Blog, Engineering Director Shiva Shivakumar indicated why Google launched Sitemaps:
“Initially, we plan to use the URL information webmasters supply to further improve the coverage and freshness of our index. Over time that will lead to our doing an even better job of delivering more search results from more websites.”
Shiva also gave an extensive interview to Danny Sullivan over at the SearchEngineWatch Blog. In it, Shiva iterates that the Sitemaps program’s current beta state; he won’t guarantee each submitted URL would be crawled. He did indicate that this was something they were working toward, however.
As mentioned, in order to participate in Sitemaps, you have to have a Google account and you have to place an XML file on the webserver being used by your site. This is done in order to inform Google’s crawlers of what URLs to look for and how often these pages change. As pointed out by Rusty, over at SocialPatterns.com, SEM Michael Nguyen broke down an example of his Sitemaps’ XML code, line-by-line; in order to shed some light on what’s actually being done.
The XML file must also the URL of each page you want to be in the Sitemaps program. If you have four pages that undergo frequent change, all 4 page URLs should be listed, if you have an entire site that you want included, you have to include the URL of each page. By employing the changefreq and priority XML tags, you can also indicate how important each page is and how frequently the page changes.
After the XML is complete, you must submit it to the Sitemaps program. This is where the Google account comes in. Once the URL of the sitemap is submitted, your task is complete.
There are a couple of methods you can use in order to get an XML sitemap. A sitemap generator can be downloaded from Google or you can develop one. The generator is an open-source Python file that has to be uploaded to the webserver. According to their FAQ the sitemap generator “can create sitemaps from URL lists, webserver directories, or from access logs.”
You can also develop your own XML sitemap if you so choose. This will have to be submitted as well. The final method Google accepts is a text file containing the URLs you want in the program. Obviously, this method is saved for those who have little experience dealing with webservers or structural web alterations. It also seems like the text files will be given the lowest priority, at least until the program is off and running.
As to whether or not you should be taking part in the Google Sitemaps program is quite simple: if search engines play any role in your business whatsoever, you should be apart of the program. Having Google’s (or any other search engine for that matter) index reflect changes in your site quickly will only benefit your search engine presence. Or as Nathan Weinberg says, it’d be stupid not too.
An additional area of interest is that Google made Sitemaps as open-source as possible… at least on the XML end. By making the sitemap generator in Python and releasing it under the Attribution/Share Alike Creative Commons license, Google is only furthering their embrace of open-source. This also allows the program to be adapted in order to support other search engines.
Update: Someone who emailed me installed the sitemap generator on his webserver, and evidently the server went boom… or it at least was overtaxed. Here’s a quote from his post discussing the event: “Running it brought down my 3200MHz Pentium 4 running Debian Linux and 2 GB of RAM.”
Read Theo’s report and see for yourselves.