Click Official ELI Links
Get Help With Your Extortion Letter | ELI Phone Support | ELI Legal Representation Program
Show your support of the ELI website & ELI Forums through a PayPal Contribution. Thank you for supporting the ongoing fight and reporting of Extortion Settlement Demand Letters.

Author Topic: Bot trap for image browsing  (Read 3991 times)

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Bot trap for image browsing
« on: December 02, 2011, 02:03:17 PM »
Hi all,

I got an extortion letter early this week.  Oddly, I'd been working on banning bots all last month but I hadn't been worried about images. My issue was cpu and memory, which bots were sucking like crazy.  Images are static, so don't cause that problem. Needless to say, I now see the need to try to trap bots that are racing through images.   I've been reading various concerns  and had some of my own. These here including:

1) Not knowing IPs of things like picScout's current for certain.
2) PicScout (or others in future) changing IPs.
3) Masking of user agents

etc.

So, I want to come up with a way that a blogger, web site or forum host can at identify bots as they crawl and block them. No way is going to be 100% effective, but this morning I've been ginning up an idea based on the bot crawler here:

http://danielwebb.us/software/bot-trap/

That 'bot crawler will not work for catching some programmed image crawling bots I've seen crawling my blog because at least some programmed image crawling bots aren't going to hit a php file on purpose.  They are programmed to just crawl through images leaving php files alone. The also don't make mistakes. (I know how to catch bots that make mistakes on a wordpress blog and would know how to do it here at the forum. More on that later.)

My idea for catching what I might name "pure image browzing bots" is to do this:

1) add directory specific .htaccess files to directories I wish to prevent browzing by image bots. (These would b at least in my image directories. I could put them higher up-- but I need to be sure I know how to avoid screwing up a complicated .htaccess file in that case.  Anyway, I really only want to block these guys from images.)
2)  add an image or multiple images that I *never* link on purpose to my site. These can be 1 pixel colored images or anything.  For now, call that image 'honeyPotImage.jpg'
3) in a top level htaccess, send any bot trying to 'honeyPotImage.jpg' those specific images to a bot-trap written in php.  This bot-trap is somewhat similar to the one above.
4) Add the IP of all bots sent to the trap to the appropriate htaccess files. 
5) After (4) the bots (or whoever gets trapped) will no longer be able to load images in the protected directories even when they load text. Note: because they can load text, human visitors to my blog will be able to tell me that images vanished. This will let me unban them-- taking care to do this in a way that I think will still protect me from bots.

FWIW: I'll be adding some whitelisted hosts to the tool. My first draft has google and bingbot white listed.

I'm going to get this working for my blogs. I was wondering if others would be interested in using it once it's working? If yes, I might ask you questions to figure out how to make this user friendly. Also, if people do use it, at some point, we may want to share lists of user agents and IPs we are seeing racing through images. 

This sharing could be automated and  would help us identify any changes in IP ranges or host addresses and help people at sites 2-N ban the creepy bots as soon as they are detected at site 1.

FWIW: Lots of people at web host forums are complaining about these bots for reasons other than concerns about getting a Getty letter. The bots just race through, suck bandwidth, clutter up server logs and are just a plain old nuisance. Because of the latter, if the system is made convenient, we might be able to get lots of people using it. But first I think I just need to  know if anyone would like to volunteer to try it in a week or two after I have it working. Actually, probably by Wed.

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: Bot trap for image browsing
« Reply #1 on: December 02, 2011, 02:32:56 PM »
This has been on my list to do as well, just haven't had the time to try out the bot trap...that being said I'd be willing to give it a shot when you get it completed... I need to digest this a bit more before I open my trap, but i might have some further ideas/suggestions..
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Bot trap for image browsing
« Reply #2 on: December 02, 2011, 03:12:26 PM »
I saw this bot-trap suggested a few times.  I got to it quickly because, believe it or not, I was working on something to bounce bots from Wordpress already, and I'd been using some of the idea around the bot already.

You'll see I discussed some fiddling at my blog:
http://rankexploits.com/musings/2011/sorry-bergen-norway/

I even set up a new blog to discuss the fiddling. (Though, very little is discussed at the new one.)
http://rankexploits.com/protect/

But up until this week, I didn't see any big reason to block the bots that do nothing but load images.  I thought they were obnoxious, but they didn't spike memory or cpu.


So, suggest away. The sooner the better.  I'm not a programmer-- but I can program.  One thing I find is that it helps to have a plan before coding rather than coding away and then changing to suit a new plan. Plus, I'm perfectly capable of ignoring an idea if I think there is some reason it should be ignored.  Also, if it gets to off topic, we can move the conversation to the "new" blog, and then just post synopses.

 

 

Official ELI Help Options
Get Help With Your Extortion Letter | ELI Phone Support Call | ELI Defense Letter Program
Show your support of the ELI website & ELI Forums through a PayPal Contribution. Thank you for supporting the ongoing fight and reporting of Extortion Settlement Demand Letters.