Experimenting with Mark Pilgrim’s pygoogle stuff: my first serious effort at Python.
1 import google <- this is much more than 6 lines, but that’s what libraries are about
2 thisURL = ‘related:http://www.paulbeard.org/movabletype’
3 data = google.doGoogleSearch(thisURL)
4 print len (data.results), “URLs found for”, thisURL
5 for x in data.results[:]:
6 print ‘<a href=”‘ + x.URL + ‘”>’ + x.title + ‘</A>’
That little bit of code generates this result:
9 URLs found for related:http://www.paulbeard.org/movabletype
<a href=”/movabletype/”>quotidian</A>
<a href=”http://fsteele.dyndns.org/”>Nicest of the Damned</A>
<a href=”http://www.naveja.net/”>Naveja.net</A>
<a href=”http://radio.weblogs.com/0110735/”>john mahoney’s radio weblog/<A>
<a href=”http://www.rebeccablood.net/”>what’s in rebecca’s pocket?</A>
<a href=”http://www.theshiftedlibrarian.com/”>The Shifted Librarian</A>
<a href=”http://www.robotwisdom.com/”>robot wisdom weblog</A>
<a href=”http://www.ozzie.net/blog/stories/2002/08/04/why.html”>Why?</A>
<a href=”http://www.geocrawler.com/lists/3/FreeBSD/163/0/9429414/”>Geocrawler.com – freebsd-mobile – Crash after resume: what caused </A>
To do:
- not print a result that == the page we search against (that would knock out the first result above)
- build a way to pass the URL of the current page (most likely an environment variable we’ll get from the browser)
- if I’m really ambitious, implement a crude cache by writing these to disk and displaying those on repeated requests to maximize my 1,000 queries a day