Welcome to Squishdot How-To Newbies Squishdot
 about
 search
 post article
 Documentation
 Mailing Lists
 Bug Tracking
 Development
 Installation
 Upgrading
 Download
 admin
 rdf

 main


Google Spidering addPostingForm
Squishdot Posted by Bruce Perens on Wednesday September 06, 04:11PM, 2000
from the dept.
Google has been web-spidering technocrat.net . Because there are over 7000 postings, that's a lot of access - it's getting close to one every second. One thing I notice is that it spiders the addPostingForm on every article, effectively doubling the number of accesses. Unfortunately, the robot exclusion standard isn't fine-grained enough to be able to say don't spider anything with this name. Thus, I propose to rename addPostingForm to addPostingForm.cgi as a hack to tell robots not to access it. Another thing I could do is add a default argument to all invocations of addPostingForm. That might put off the spiders, and I could switch on the presence or absence of that argument to have addPostingForm bail out without any processing when it's invoked without it.

Thanks

Bruce

<  |  >

 

Related Links
  • Articles on Squishdot
  • Also by Bruce Perens
  • Contact author
  • The Fine Print: The following comments are owned by whoever posted them.
    ( Reply )

    Difficult :-S
    by Chris Withers on Friday September 08, 03:32PM, 2000
    (Guessing) Can you not put something like the following in robots.txt:

    Disallow: */addPostingForm
    ?

    I can't see any problems with renaming addPostingForm as I think all the references to it are in DTML, it's not something I'd do in the Squishdot distribution though...

    The default argument thing might be easier to do, since it'll just get parsed by Zope and not used.

    This is a tricky issue though :-S
    [ Reply to this ]
    Re: Google Spidering addPostingForm
    by Darrick Wong on Friday September 08, 10:57PM, 2000
    You could also do this to keep Googlebot out:

    <html><head>
    <meta name="robots" content="noindex,nofollow">
    ...
    </head><body>

    Details available at http://info.webcrawler.com/mak/projects/robots/meta-user.html. Keep in mind that not all robots support this tag, although I think Googlebot does.

    --Darrick
    [ Reply to this ]
    Re: Google Spidering addPostingForm
    by Erich on Saturday September 16, 07:42PM, 2000
    appending a ? to the url might also work
    http://test.com/url?
    isn't spidered by most robots, i think.
    [ Reply to this ]
    The Fine Print: The following comments are owned by whoever posted them.
    ( Reply )

    Powered by Zope  Squishdot Powered
      "Any system that depends on reliability is unreliable." -- Nogg's Postulate
    All trademarks and copyrights on this page are owned by their respective companies. Comments are owned by the Poster. The Rest ©1999 Butch Landingin, ©2000-2002 Chris Withers.