Google has started probing links sent through Gmail as seeds for Googlebot. They are indeed very anxious to catalog and index the entire web, even to the point of troublesome privacy concerns. Should the contents of private e-mail be used for Google's own purposes? The Gmail help center says no, but I have evidence otherwise...
A couple of weeks ago, I signed up for an account with DailyLit, a marvelous web site that sends you short installments of novels or other works. Most of these works are in the public domain, of course, but there is the odd author who has placed their works under a Creative Commons license, like Cory Doctorow or Charles Stross.
Installments are sent on a regular schedule -- for example, I'm currently reading Moby Dick via emails that get sent Monday through Friday at 5am Eastern time. Each e-mail ends with a couple of links; among these are a link to immediately send the next installment (instead of waiting for the schedule) and a link to pause the schedule (if, for example, you will be out of town).
Saturday I started getting extra installments of Moby Dick sent at seemingly random times - 4:27pm and 8:33pm. I posted a bug to Dailylit's forums, and got this summary in private e-mail (though I've altered the URL, obviously):
It turns out that the "resend" function
for your subscription was accessed by google bot -- as per the following
lines from our logs66.249.65.98 - - [13/Oct/2007:16:33:40 -0400] "GET
/subs/resend/[unique-uuid-elided]/117 HTTP/1.1" 200 6667 "-"
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"We will immediately make sure that google does not access these pages. It's
unclear to me how they discovered these pages in the first place other than
from looking at links in gmail (do you read via gmail?).
I do read via Gmail, and I cannot imagine how else the Googlebot got a hold of the "send next installment URL". I don't forward these emails and I've never copied-and-pasted any of these URLs anywhere else. In short, I don't see how Googlebot got a hold of this URL unless it's been reading my e-mail...