Adam Howitt's Blog

Apr 10
2008

Google Sitemaps made easy with Linux

I just discovered that Google Webmaster now allows you to register Google Sitemaps in a variety of formats: RSS, Simple text or sitemap XML format.

This may not be exciting news to most but the simple text format makes life drastically simpler in terms of an entry point to creating a sitemap. 

I was about to fire up a text editor and some tunes to rip through a site to manually collect the page names when I realized that the find command in linux will spit out a carriage return separated list.  A quick

find -name \*.\.htm
yielded the foundation of what I needed.  Note that backslashes used to escape special characters.  This gave me output as follows:
./index.htm
./thanks_mailing.htm
./resources.htm
./closing copy.htm
./header.htm
./vegas_17.htm

The last step was to use the substitute sed command to replace the ./ at the start of the string with the site name and pipe it into the sitemap.txt file:

find -name  \*\.htm | sed 's/^./http:\/\/www\.mysite\.com/' > sitemap.txt

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)

There are no comments for this entry.

[Add Comment] [Subscribe to Comments]