Thanks! Any publicity is good.
No one seems to have a "definitive" set of instructions or guidelines to lock out Picscout bots, etc. There is no definitive set of instructions to lock out Picscout bots. Such a thing will never exist. As long as someone makes a file publicly accessible in some way, a bot can get to it.
What I am trying to do is broader. But I do block many avenues that would image bots cause big problems for blogs. But for those willing to go through the outline will be able to dramatically reduce the amount of scraping-- and that will interfere with Picscout tine eye etc. But it's not going to be a short outline. It's a systems of a lot of simple things -- many of which are useful to do anyway-- all implemented together. And then a few "biggies" that no one does that draw everything together.
Nothing I have done can compare to what Lucia is doing technically...
Part of what I've been doing is collecting information. I am now collecting "kill" logs from 9 domains-- only 2 controlled by me. (Initially, I only collected mine. Now, I hunt for the public killed_log.txt files and suck in data.)
That means I'm starting to be able to search to find patterns. Robert-- have a look at
http://bannasties.com/BanNastiesScripts/ShowDetailsWhy.php?Why=image&Days=90That's a whole bunch of "things" --- probably mostly bots-- who asked for images in ways that the web admins thought was "suspicious". Many are from my sites-- because I am very suspicious of lots of stuff and created lots of special blocking "rules" for things that look like image scraping. Most of the other guys run forums and have no images but run ZB Block with with the default rules only (or rules that they need for their site.) So any search with Why=image in it is going to find more reports from me than anyone else.
I'm actually hoping I can get people to start using ZBblock publish their killed_log.txt somewhere public so I can collect more data. There are lots of ways to block. But someways of blocking help us build collective intelligence. Other ways don't.
(If anyone wants to install ZB block somewhere I'd be happy to help out. I'd ask them to also put their killed_log.txt somewhere readable so I could have my script collect data. I'd love, love, love if Matt started using ZB block. But there is a somewhat steep learning curve-- and you need your audience to understand what's going on while you are implementing it. I don't think he'd be happy with the program if he just loaded it up and didn't have someone used to the product there to know to figure out what to do if someone does get blocked. Some times the author can't anticipate which "rules" will be incompatible with certain software-- though relatively few boo-boos happen. )
and some users like Lucia have a worldwide audienceYep. I have a worldwide traffic-- and quite a bit. So I am reluctant to just block Israel. I also so see scraping from other countries. It's a blog which automatically build decent self-linking structure and "pings", so it attracts lots of "stuff". And I have a lot of internal links (Google page ran 6) so a lot of services end up pointed at me. I may be ideally positioned to detect patterns, new bot user agents, new IP ranges and so on. So, I'm really trying to get this collective intelligence goings.
On getting my info out:
I got the new domain so that I can co-locates all the stuff I have learned and organize the information in a systematic way. If you visit the link to the "outline" now, you'll see one of the items in the outline is now a hyperlink. See the 'blue' here:
http://blog.bannasties.com/controlling-exploitative-bots-aka-nasties/ My server logs show it is being crawled... (This is going to get crawled. Lots of people search on the really boring looking stuff on some of the articles I robo-post.)
I'm going to be filling in the "how to" parts first and the "why" articles will be posted later.