Following the great advice given in this forum about setting up bad bot traps to harvest IPs that need to be blocked, I caught a crawler ignoring my robots.txt file. It was an IP address from China pretending to be a Baidu crawler. It turned out to be registered by these people:
https://www.markmonitor.com/solutions/role_based_solutions-legal.php
Towards the bottom of the page they mention how they can ...
"detect infringement using the industry’s widest monitoring net"
Which apparently means snooping around with incognito crawlers to see what they can find. This broadens the Intellectual Property Crawlerbot playing field beyond images to include games, media, music, and just about anything that can be branded including text copy. Let's hope they don't intend to emulate the extortion letter business model. I think I'll block their IP anyway.
This is what you need to block to keep them out of your server:
IP address: 123.125.71.110
User Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
https://www.markmonitor.com/solutions/role_based_solutions-legal.php
Towards the bottom of the page they mention how they can ...
"detect infringement using the industry’s widest monitoring net"
Which apparently means snooping around with incognito crawlers to see what they can find. This broadens the Intellectual Property Crawlerbot playing field beyond images to include games, media, music, and just about anything that can be branded including text copy. Let's hope they don't intend to emulate the extortion letter business model. I think I'll block their IP anyway.
This is what you need to block to keep them out of your server:
IP address: 123.125.71.110
User Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)