ExtortionLetterInfo Forums
ELI Forums => Getty Images Letter Forum => Topic started by: Moe Hacken on May 06, 2012, 10:52:32 PM
-
Following the great advice given in this forum about setting up bad bot traps to harvest IPs that need to be blocked, I caught a crawler ignoring my robots.txt file. It was an IP address from China pretending to be a Baidu crawler. It turned out to be registered by these people:
https://www.markmonitor.com/solutions/role_based_solutions-legal.php
Towards the bottom of the page they mention how they can ...
"detect infringement using the industry’s widest monitoring net"
Which apparently means snooping around with incognito crawlers to see what they can find. This broadens the Intellectual Property Crawlerbot playing field beyond images to include games, media, music, and just about anything that can be branded including text copy. Let's hope they don't intend to emulate the extortion letter business model. I think I'll block their IP anyway.
This is what you need to block to keep them out of your server:
IP address: 123.125.71.110
User Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
-
I block everything with "baidu" in the user agent. It's either
a) really baidu-- which provides me no value and uses up bandwidth, cpu and memory or
b) spoofing baidu which is worse than providing me no value.
There is a whole list of copyright bots out there that should be blocked.
-
Thanks, Lucia, you've shared some great advice on this topic. I just laid out the bot trap today and it didn't take an hour before this came in. I'll be keeping an eye on these patterns and keep reporting. I do get a fair amount of traffic from Baidu, which as you say is pretty much worthless to me.