Wednesday, September 12, 2007

Quick Check of Google Crawl

If you are not using Google's Webmaster tools this is a quick BASH script which can check the spider rate.

type cache:your.website.here in a google search
note the return URL in the browser - save this (must have the IP)

#!/bin/bash
set -o errexit
stamp=`date`
touch temp.txt
lynx -dump -accept_all_cookies "cached.url.here" | grep 'retrieved' | cut -c 4-50>>temp.txt
cache=`catrm -rf temp.txt
echo $stamp Google $cache>>/path/to/desired/dir/file.txt

Check out the documentation on the cut command which gets the info from grep, this truncates the number of characters passed to the temp.txt, adjust to what you need to get the desired result.

this should give you a return result like this:
Fri Aug 31 14:18:05 CDT 2007 Google retrieved on Aug 30, 2007 13:49:11 GMT.
Tue Sep 4 09:10:20 CDT 2007 Google retrieved on Aug 31, 2007 14:35:14 GMT.
Wed Sep 5 07:51:55 CDT 2007 Google retrieved on Sep 2, 2007 15:52:02 GMT.
Thu Sep 6 13:01:19 CDT 2007 Google retrieved on Sep 4, 2007 22:35:39 GMT.
Fri Sep 7 07:00:00 CDT 2007 Google retrieved on Sep 5, 2007 13:25:22 GMT.
Sat Sep 8 07:00:00 CDT 2007 Google retrieved on Sep 6, 2007 13:28:59 GMT.
Sun Sep 9 07:00:00 CDT 2007 Google retrieved on Sep 8, 2007 08:19:05 GMT.
Mon Sep 10 07:00:00 CDT 2007 Google retrieved on Sep 8, 2007 08:19:05 GMT.
Tue Sep 11 07:00:00 CDT 2007 Google retrieved on Sep 10, 2007 08:54:21 GMT.
Wed Sep 12 07:00:00 CDT 2007 Google retrieved on Sep 10, 2007 23:52:44 GMT.

a quick a dirty log of when Google Crawled my site, I then just threw this to crontab to run every morning at 4am, and my browser is set to open this link upon activation.

Enjoy.

No comments: