Soylent--
When I first started looking at how to block, I thought it would be easy. But I kept watching my logs.... and the number of image bots is rather amazing. Also, as I blocked things, other seemingly 'adaptive' behavior became evident. For example, after I began banning lots of obvious image bots, suddenly, I see images being loaded by something with the user agent "TraumaCadX". Someone is browsing with a user agent designed to read Xrays? Really? After blocking that, suddenly people are visiting my images with Playstations. Really? Then I start seeing visits with user agents indicating that the visits are from a server that is optimized to save images. Do I believe that's a human visiting my blog? No. I do not.
To screen image stuff, I'm mostly using ZB Block but with a tweak to get most image scrapers diverted into php. (This tweak is required because images are static and ZB Block only applies to php. I'm not really sure how I'm going to explain it to people so that it's easy to use without screwing up your blog! )
If you use ZBblock, I'd be glad to discuss which custom sigs seem especially useful for image stripping.
The "image" or "suspected" things I am blocking has expanded . Below, I'll show some of the lines of code-- but with the useragent bolded for the first few so you get the idea what's important to block.
# bandwith sucking pictures/copyright bots
$ax = $ax + (inmatch($lcuseragent,"
psbot","; psbot ua Images. INSTA-BAN."));
$ax = $ax + (inmatch($lcuseragent,"
picsearch","; picsearch ua Images.. INSTA-BAN."));
$ax = $ax + (inmatch($lcuseragent,"
playstation","; playstation ua : images INSTA-BAN."));
$ax = $ax + (inmatch($lcuseragent,"pixray","; Pixray ua image bot INSTA-BAN. ")); // Pixray-Seeker/1 (Pixray-Seeker;
http://www.pixray.com/pixraybot; [email protected])
$ax = $ax + (inmatch($lcuseragent,"phantom.js bot","; Phantom.js Images Scraper: INSTA-BAN. ")); //
$ax = $ax + (inmatch($lcuseragent,"upictoBot","; Presumed Image Scraper(?): upictoBot. INSTA-BAN. ")); //
$ax = $ax + (inmatch($useragent,"TraumaCadX","; TraumaCadX is in your user agent. image. INSTA-BAN.")); //
$ax = $ax + (inmatch($useragent,"Copyright","; iCopyright Conductor 1.0 Nasty. INSTA-BAN.")); #
$ax = $ax + (inmatch($lcuseragent,"getty","; Image UA:getty. INSTA-BAN. ")); //
$ax = $ax + (lmatch($useragent,"Extreme Picture Finder","; Extreme Picture Finder Scraper UA. images. INSTA-BAN. ")); //
$ax = $ax + (inmatch($useragent,"BPImageWalker","; BPImageWalker. INSTA-BAN. ")); //
$ax = $ax + (inmatch($lcuseragent,"cydral","; cydral Image . INSTA-BAN. ")); //
$ax = $ax + (inmatch($lcuseragent,"doubanbot","; doubanbot Image Scraper. INSTA-BAN. ")); //
$ax = $ax + (inmatch($useragent,"CoverScout","; Album Cover Searching. images. INSTA-BAN."));
$ax = $ax + (inmatch($useragent,"ImageProHD","; ImageProHD: not a browser (3) INSTA-BAN."));
$ax = $ax + (inmatch($useragent,"WikioImagesBot","; WikioImagesBot: unknown untraceable bot. INSTA-BAN."));
$ax = $ax + (inmatch($lcuseragent,"tineye","; tineye Image ua INSTA-BAN. ")); //
$ax = $ax + (inmatch($lcuseragent,"wesee.com","; wesee.com images. I approve of filtering for adult content, but I also don't trust you. Go away. Nasty. ")); //68c '
http://www.wesee.com/en/support/bot/$ax = $ax + (inmatch($lcuseragent,"digimarc","; Copyright bot Digimarc. images. INSTA-BAN.")); //
$ax = $ax + (inmatch($lcuseragent,"bitvo","; bitvo.com Image scraper. images. INSTA-BAN.")); //
$ax = $ax + (inmatch($useragent,"NSPlayer","; NSPlayer Images (?) ua. images. INSTA-BAN.")); # This is always just scraping and switches user agent back and forth to vlc/. (Sometimes playstation.)
$ax = $ax + (inmatch($lcuseragent,"nsplayer","; NSPlayer Images (?) ua. images. INSTA-BAN.")); # This is always just scraping and switches user agent back and forth to vlc/. (Sometimes playstation.)
$ax = $ax + (inmatch($lcuseragent,"vlc/","; vlc/ Images scraper (?) ua. images.. INSTA-BAN.")); # This is always just scraping and switches user agent back and forth to NSPlayer.
$ax = $ax + (inmatch($lcuseragent,"webcollage","; webcollage images. INSTA-BAN.")); //http://www.webcollage.com/ there is no reason to let *someone else* use webcollage on my server.
$ax = $ax + (inmatch($useragent,"Mozilla 3.01 PBWF (Win95)","; imagelock: now defunct. Shouldn't be visiting. Nasty. INSTA-BAN. ")); #'
$ax = $ax + (inmatch($useragent,"Corp_Device_User","; Corp_Device_User. Nasty. images.?")); # I have no idea what this bot is doing. It looked like the GAP looking at images.
$ax = $ax + (inmatch($useragent,"mShots","; mShots. images.")); # this is a plugin that takes screenshots of your page over and over.
$ax = $ax + (inmatch($hoster,"getty","; getty: Image host. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"picscout","; picscout: Image host. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"tineye","; tineye: Image host INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"prioritycolo.com","; prioritycolo.com Image host INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"bezeqint","; bezeqint: Home of picscout. Bad all around. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"istockphoto","; istockphoto: Image host. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"pingdom.com","; pingdom.com: Image host. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,".ethz.ch","; .ethz.ch: Might be image host(?) INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"copyscape","; copyscape: Copyright service. Nasty. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"googlealert","; googlealert: Copyright service; <i>is not</i> google. Nasty. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"pixray","; pixray: Image host. INSTA-BAN.")); //
$ax = $ax + (inmatch($hoster,"baidu","; baidu: either real or spoofed. INSTA-BAN.")); //
$ax = $ax + (inmatch($lcuseragent,"portalimage","; bad ua 5 Nasty. images. ")); //
http://www.webmasterworld.com/search_engine_spiders/4398144.htm$ax = $ax + (inmatch($useragent,"SUSE",";image stripping linux browser. INSTA-BAN. ")); // look for SUSE here
http://www.useragentstring.com/pages/Firefox/$ax = $ax + (inmatch($lcuseragent,"superlumin",";image video proxy. INSTA-BAN. ")); //
http://www.superlumin.com/nemesis.php http://www.superlumin.com/video.phpBut there are all sorts of other bots, crawlers spiders etc I don't trust. But depending on how you run your site, you need:
1) A way to block IPs, useragents and hosts.
2) A way to block these things for *images*. (.htaccess, ZBblock if tweaked, firewall, whatever.)
3) Lists of user agents, IPs and hosts to block.
In my code above, you can see user agents to block and/or hosts to block if you know how to read the command.